D47crunch

Standardization and analytical error propagation of Δ47 and Δ48 clumped-isotope measurements

Process and standardize carbonate and/or CO2 clumped-isotope analyses, from low-level data out of a dual-inlet mass spectrometer to final, “absolute” Δ47, Δ48 and Δ49 values with fully propagated analytical error estimates (Daëron, 2021).

The tutorial section takes you through a series of simple steps to import/process data and print out the results. The how-to section provides instructions applicable to various specific tasks.

1. Tutorial

1.1 Installation

The easy option is to use pip; open a shell terminal and simply type:

python -m pip install D47crunch

For those wishing to experiment with the bleeding-edge development version, this can be done through the following steps:

  1. Download the dev branch source code here and rename it to D47crunch.py.
  2. Do any of the following:
    • copy D47crunch.py to somewhere in your Python path
    • copy D47crunch.py to a working directory (import D47crunch will only work if called within that directory)
    • copy D47crunch.py to any other location (e.g., /foo/bar) and then use the following code snippet in your own code to import D47crunch:
import sys
sys.path.append('/foo/bar')
import D47crunch

Documentation for the development version can be downloaded here (save html file and open it locally).

1.2 Usage

Start by creating a file named rawdata.csv with the following contents:

UID,  Sample,           d45,       d46,        d47,        d48,       d49
A01,  ETH-1,        5.79502,  11.62767,   16.89351,   24.56708,   0.79486
A02,  MYSAMPLE-1,   6.21907,  11.49107,   17.27749,   24.58270,   1.56318
A03,  ETH-2,       -6.05868,  -4.81718,  -11.63506,  -10.32578,   0.61352
A04,  MYSAMPLE-2,  -3.86184,   4.94184,    0.60612,   10.52732,   0.57118
A05,  ETH-3,        5.54365,  12.05228,   17.40555,   25.96919,   0.74608
A06,  ETH-2,       -6.06706,  -4.87710,  -11.69927,  -10.64421,   1.61234
A07,  ETH-1,        5.78821,  11.55910,   16.80191,   24.56423,   1.47963
A08,  MYSAMPLE-2,  -3.87692,   4.86889,    0.52185,   10.40390,   1.07032

Then instantiate a D47data object which will store and process this data:

import D47crunch
mydata = D47crunch.D47data()

For now, this object is empty:

>>> print(mydata)
[]

To load the analyses saved in rawdata.csv into our D47data object and process the data:

mydata.read('rawdata.csv')

# compute δ13C, δ18O of working gas:
mydata.wg()

# compute δ13C, δ18O, raw Δ47 values for each analysis:
mydata.crunch()

# compute absolute Δ47 values for each analysis
# as well as average Δ47 values for each sample:
mydata.standardize()

We can now print a summary of the data processing:

>>> mydata.summary(verbose = True, save_to_file = False)
[summary]        
–––––––––––––––––––––––––––––––  –––––––––
N samples (anchors + unknowns)   5 (3 + 2)
N analyses (anchors + unknowns)  8 (5 + 3)
Repeatability of δ13C_VPDB         4.2 ppm
Repeatability of δ18O_VSMOW       47.5 ppm
Repeatability of Δ47 (anchors)    13.4 ppm
Repeatability of Δ47 (unknowns)    2.5 ppm
Repeatability of Δ47 (all)         9.6 ppm
Model degrees of freedom                 3
Student's 95% t-factor                3.18
Standardization method              pooled
–––––––––––––––––––––––––––––––  –––––––––

This tells us that our data set contains 5 different samples: 3 anchors (ETH-1, ETH-2, ETH-3) and 2 unknowns (MYSAMPLE-1, MYSAMPLE-2). The total number of analyses is 8, with 5 anchor analyses and 3 unknown analyses. We get an estimate of the analytical repeatability (i.e. the overall, pooled standard deviation) for δ13C, δ18O and Δ47, as well as the number of degrees of freedom (here, 3) that these estimated standard deviations are based on, along with the corresponding Student's t-factor (here, 3.18) for 95 % confidence limits. Finally, the summary indicates that we used a “pooled” standardization approach (see [Daëron, 2021]).

To see the actual results:

>>> mydata.table_of_samples(verbose = True, save_to_file = False)
[table_of_samples] 
––––––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
Sample      N  d13C_VPDB  d18O_VSMOW     D47      SE    95% CL      SD  p_Levene
––––––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
ETH-1       2       2.01       37.01  0.2052                    0.0131          
ETH-2       2     -10.17       19.88  0.2085                    0.0026          
ETH-3       1       1.73       37.49  0.6132                                    
MYSAMPLE-1  1       2.48       36.90  0.2996  0.0091  ± 0.0291                  
MYSAMPLE-2  2      -8.17       30.05  0.6600  0.0115  ± 0.0366  0.0025          
––––––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––

This table lists, for each sample, the number of analytical replicates, average δ13C and δ18O values (for the analyte CO2 , not for the carbonate itself), the average Δ47 value and the SD of Δ47 for all replicates of this sample. For unknown samples, the SE and 95 % confidence limits for mean Δ47 are also listed These 95 % CL take into account the number of degrees of freedom of the regression model, so that in large datasets the 95 % CL will tend to 1.96 times the SE, but in this case the applicable t-factor is much larger.

We can also generate a table of all analyses in the data set (again, note that d18O_VSMOW is the composition of the CO2 analyte):

>>> mydata.table_of_analyses(verbose = True, save_to_file = False)
[table_of_analyses] 
–––  –––––––––  ––––––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––
UID    Session      Sample  d13Cwg_VPDB  d18Owg_VSMOW        d45        d46         d47         d48       d49   d13C_VPDB  d18O_VSMOW     D47raw     D48raw      D49raw       D47
–––  –––––––––  ––––––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––
A01  mySession       ETH-1       -3.807        24.921   5.795020  11.627670   16.893510   24.567080  0.794860    2.014086   37.041843  -0.574686   1.149684  -27.690250  0.214454
A02  mySession  MYSAMPLE-1       -3.807        24.921   6.219070  11.491070   17.277490   24.582700  1.563180    2.476827   36.898281  -0.499264   1.435380  -27.122614  0.299589
A03  mySession       ETH-2       -3.807        24.921  -6.058680  -4.817180  -11.635060  -10.325780  0.613520  -10.166796   19.907706  -0.685979  -0.721617   16.716901  0.206693
A04  mySession  MYSAMPLE-2       -3.807        24.921  -3.861840   4.941840    0.606120   10.527320  0.571180   -8.159927   30.087230  -0.248531   0.613099   -4.979413  0.658270
A05  mySession       ETH-3       -3.807        24.921   5.543650  12.052280   17.405550   25.969190  0.746080    1.727029   37.485567  -0.226150   1.678699  -28.280301  0.613200
A06  mySession       ETH-2       -3.807        24.921  -6.067060  -4.877100  -11.699270  -10.644210  1.612340  -10.173599   19.845192  -0.683054  -0.922832   17.861363  0.210328
A07  mySession       ETH-1       -3.807        24.921   5.788210  11.559100   16.801910   24.564230  1.479630    2.009281   36.970298  -0.591129   1.282632  -26.888335  0.195926
A08  mySession  MYSAMPLE-2       -3.807        24.921  -3.876920   4.868890    0.521850   10.403900  1.070320   -8.173486   30.011134  -0.245768   0.636159   -4.324964  0.661803
–––  –––––––––  ––––––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––

2. How-to

2.1 Simulate a virtual data set to play with

It is sometimes convenient to quickly build a virtual data set of analyses, for instance to assess the final analytical precision achievable for a given combination of anchor and unknown analyses (see also Fig. 6 of Daëron, 2021).

This can be achieved with virtual_data(). The example below creates a dataset with four sessions, each of which comprises three analyses of anchor ETH-1, three of ETH-2, three of ETH-3, and three analyses each of two unknown samples named FOO and BAR with an arbitrarily defined isotopic composition. Analytical repeatabilities for Δ47 and Δ48 are also specified arbitrarily. See the virtual_data() documentation for additional configuration parameters.

from D47crunch import virtual_data, D47data

args = dict(
    samples = [
        dict(Sample = 'ETH-1', N = 3),
        dict(Sample = 'ETH-2', N = 3),
        dict(Sample = 'ETH-3', N = 3),
        dict(Sample = 'FOO', N = 3,
            d13C_VPDB = -5., d18O_VPDB = -10.,
            D47 = 0.3, D48 = 0.15),
        dict(Sample = 'BAR', N = 3,
            d13C_VPDB = -15., d18O_VPDB = -2.,
            D47 = 0.6, D48 = 0.2),
        ], rD47 = 0.010, rD48 = 0.030)

session1 = virtual_data(session = 'Session_01', **args, seed = 123)
session2 = virtual_data(session = 'Session_02', **args, seed = 1234)
session3 = virtual_data(session = 'Session_03', **args, seed = 12345)
session4 = virtual_data(session = 'Session_04', **args, seed = 123456)

D = D47data(session1 + session2 + session3 + session4)

D.crunch()
D.standardize()

D.table_of_sessions(verbose = True, save_to_file = False)
D.table_of_samples(verbose = True, save_to_file = False)
D.table_of_analyses(verbose = True, save_to_file = False)

2.2 Control data quality

D47crunch offers several tools to visualize processed data. The examples below use the same virtual data set, generated with:

from D47crunch import *
from random import shuffle

# generate virtual data:
args = dict(
    samples = [
        dict(Sample = 'ETH-1', N = 8),
        dict(Sample = 'ETH-2', N = 8),
        dict(Sample = 'ETH-3', N = 8),
        dict(Sample = 'FOO', N = 4,
            d13C_VPDB = -5., d18O_VPDB = -10.,
            D47 = 0.3, D48 = 0.15),
        dict(Sample = 'BAR', N = 4,
            d13C_VPDB = -15., d18O_VPDB = -15.,
            D47 = 0.5, D48 = 0.2),
        ])

sessions = [
    virtual_data(session = f'Session_{k+1:02.0f}', seed = 123456+k, **args)
    for k in range(10)]

# shuffle the data:
data = [r for s in sessions for r in s]
shuffle(data)
data = sorted(data, key = lambda r: r['Session'])

# create D47data instance:
data47 = D47data(data)

# process D47data instance:
data47.crunch()
data47.standardize()

2.2.1 Plotting the distribution of analyses through time

data47.plot_distribution_of_analyses(filename = 'time_distribution.pdf')

time_distribution.png

The plot above shows the succession of analyses as if they were all distributed at regular time intervals. See D4xdata.plot_distribution_of_analyses() for how to plot analyses as a function of “true” time (based on the TimeTag for each analysis).

2.2.2 Generating session plots

data47.plot_sessions()

Below is one of the resulting sessions plots. Each cross marker is an analysis. Anchors are in red and unknowns in blue. Short horizontal lines show the nominal Δ47 value for anchors, in red, or the average Δ47 value for unknowns, in blue (overall average for all sessions). Curved grey contours correspond to Δ47 standardization errors in this session.

D47_plot_Session_03.png

2.2.3 Plotting Δ47 or Δ48 residuals

data47.plot_residuals(filename = 'residuals.pdf', kde = True)

residuals.png

Again, note that this plot only shows the succession of analyses as if they were all distributed at regular time intervals.

2.2.4 Checking δ13C and δ18O dispersion

mydata = D47data(virtual_data(
    session = 'mysession',
    samples = [
        dict(Sample = 'ETH-1', N = 4),
        dict(Sample = 'ETH-2', N = 4),
        dict(Sample = 'ETH-3', N = 4),
        dict(Sample = 'MYSAMPLE', N = 8, D47 = 0.6, D48 = 0.1, d13C_VPDB = -4.0, d18O_VPDB = -12.0),
    ], seed = 123))

mydata.refresh()
mydata.wg()
mydata.crunch()
mydata.plot_bulk_compositions()

D4xdata.plot_bulk_compositions() produces a series of plots, one for each sample, and an additional plot with all samples together. For example, here is the plot for sample MYSAMPLE:

bulk_compositions.png

2.3 Use a different set of anchors, change anchor nominal values, and/or change oxygen-17 correction parameters

Nominal values for various carbonate standards are defined in four places:

17O correction parameters are defined by:

When creating a new instance of D47data or D48data, the current values of these variables are copied as properties of the new object. Applying custom values for, e.g., R17_VSMOW and Nominal_D47 can thus be done in several ways:

Option 1: by redefining D4xdata.R17_VSMOW and D47data.Nominal_D47 _before_ creating a D47data object:

from D47crunch import D4xdata, D47data

# redefine R17_VSMOW:
D4xdata.R17_VSMOW = 0.00037 # new value

# redefine R17_VPDB for consistency:
D4xdata.R17_VPDB = D4xdata.R17_VSMOW * (D4xdata.R18_VPDB/D4xdata.R18_VSMOW) ** D4xdata.LAMBDA_17

# edit Nominal_D47 to only include ETH-1/2/3:
D47data.Nominal_D4x = {
    a: D47data.Nominal_D4x[a]
    for a in ['ETH-1', 'ETH-2', 'ETH-3']
    }
# redefine ETH-3:
D47data.Nominal_D4x['ETH-3'] = 0.600

# only now create D47data object:
mydata = D47data()

# check the results:
print(mydata.R17_VSMOW, mydata.R17_VPDB)
print(mydata.Nominal_D47)
# NB: mydata.Nominal_D47 is just an alias for mydata.Nominal_D4x

# should print out:
# 0.00037 0.00037599710894149464
# {'ETH-1': 0.2052, 'ETH-2': 0.2085, 'ETH-3': 0.6}

Option 2: by redefining R17_VSMOW and Nominal_D47 _after_ creating a D47data object:

from D47crunch import D47data

# first create D47data object:
mydata = D47data()

# redefine R17_VSMOW:
mydata.R17_VSMOW = 0.00037 # new value

# redefine R17_VPDB for consistency:
mydata.R17_VPDB = mydata.R17_VSMOW * (mydata.R18_VPDB/mydata.R18_VSMOW) ** mydata.LAMBDA_17

# edit Nominal_D47 to only include ETH-1/2/3:
mydata.Nominal_D47 = {
    a: mydata.Nominal_D47[a]
    for a in ['ETH-1', 'ETH-2', 'ETH-3']
    }
# redefine ETH-3:
mydata.Nominal_D47['ETH-3'] = 0.600

# check the results:
print(mydata.R17_VSMOW, mydata.R17_VPDB)
print(mydata.Nominal_D47)

# should print out:
# 0.00037 0.00037599710894149464
# {'ETH-1': 0.2052, 'ETH-2': 0.2085, 'ETH-3': 0.6}

The two options above are equivalent, but the latter provides a simple way to compare different data processing choices:

from D47crunch import D47data

# create two D47data objects:
foo = D47data()
bar = D47data()

# modify foo in various ways:
foo.LAMBDA_17 = 0.52
foo.R17_VSMOW = 0.00037 # new value
foo.R17_VPDB = foo.R17_VSMOW * (foo.R18_VPDB/foo.R18_VSMOW) ** foo.LAMBDA_17
foo.Nominal_D47 = {
    'ETH-1': foo.Nominal_D47['ETH-1'],
    'ETH-2': foo.Nominal_D47['ETH-1'],
    'IAEA-C2': foo.Nominal_D47['IAEA-C2'],
    'INLAB_REF_MATERIAL': 0.666,
    }

# now import the same raw data into foo and bar:
foo.read('rawdata.csv')
foo.wg()          # compute δ13C, δ18O of working gas
foo.crunch()      # compute all δ13C, δ18O and raw Δ47 values
foo.standardize() # compute absolute Δ47 values

bar.read('rawdata.csv')
bar.wg()          # compute δ13C, δ18O of working gas
bar.crunch()      # compute all δ13C, δ18O and raw Δ47 values
bar.standardize() # compute absolute Δ47 values

# and compare the final results:
foo.table_of_samples(verbose = True, save_to_file = False)
bar.table_of_samples(verbose = True, save_to_file = False)

2.4 Process paired Δ47 and Δ48 values

Purely in terms of data processing, it is not obvious why Δ47 and Δ48 data should not be handled separately. For now, D47crunch uses two independent classes — D47data and D48data — which crunch numbers and deal with standardization in very similar ways. The following example demonstrates how to print out combined outputs for D47data and D48data.

from D47crunch import *

# generate virtual data:
args = dict(
    samples = [
        dict(Sample = 'ETH-1', N = 3),
        dict(Sample = 'ETH-2', N = 3),
        dict(Sample = 'ETH-3', N = 3),
        dict(Sample = 'FOO', N = 3,
            d13C_VPDB = -5., d18O_VPDB = -10.,
            D47 = 0.3, D48 = 0.15),
        ], rD47 = 0.010, rD48 = 0.030)

session1 = virtual_data(session = 'Session_01', **args)
session2 = virtual_data(session = 'Session_02', **args)

# create D47data instance:
data47 = D47data(session1 + session2)

# process D47data instance:
data47.crunch()
data47.standardize()

# create D48data instance:
data48 = D48data(data47) # alternatively: data48 = D48data(session1 + session2)

# process D48data instance:
data48.crunch()
data48.standardize()

# output combined results:
table_of_sessions(data47, data48)
table_of_samples(data47, data48)
table_of_analyses(data47, data48)

Expected output:

––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––
Session     Na  Nu  d13Cwg_VPDB  d18Owg_VSMOW  r_d13C  r_d18O   r_D47      a_47 ± SE  1e3 x b_47 ± SE       c_47 ± SE   r_D48      a_48 ± SE  1e3 x b_48 ± SE       c_48 ± SE
––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––
Session_01   9   3       -4.000        26.000  0.0000  0.0000  0.0098  1.021 ± 0.019   -0.398 ± 0.260  -0.903 ± 0.006  0.0486  0.540 ± 0.151    1.235 ± 0.607  -0.390 ± 0.025
Session_02   9   3       -4.000        26.000  0.0000  0.0000  0.0090  1.015 ± 0.019    0.376 ± 0.260  -0.905 ± 0.006  0.0186  1.350 ± 0.156   -0.871 ± 0.608  -0.504 ± 0.027
––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––  ––––––  –––––––––––––  –––––––––––––––  ––––––––––––––


––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
Sample  N  d13C_VPDB  d18O_VSMOW     D47      SE    95% CL      SD  p_Levene     D48      SE    95% CL      SD  p_Levene
––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
ETH-1   6       2.02       37.02  0.2052                    0.0078            0.1380                    0.0223          
ETH-2   6     -10.17       19.88  0.2085                    0.0036            0.1380                    0.0482          
ETH-3   6       1.71       37.45  0.6132                    0.0080            0.2700                    0.0176          
FOO     6      -5.00       28.91  0.3026  0.0044  ± 0.0093  0.0121     0.164  0.1397  0.0121  ± 0.0255  0.0267     0.127
––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––


–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––  ––––––––
UID     Session  Sample  d13Cwg_VPDB  d18Owg_VSMOW        d45        d46         d47         d48         d49   d13C_VPDB  d18O_VSMOW     D47raw     D48raw     D49raw       D47       D48
–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––  ––––––––
1    Session_01   ETH-1       -4.000        26.000   6.018962  10.747026   16.120787   21.286237   27.780042    2.020000   37.024281  -0.708176  -0.316435  -0.000013  0.197297  0.087763
2    Session_01   ETH-1       -4.000        26.000   6.018962  10.747026   16.132240   21.307795   27.780042    2.020000   37.024281  -0.696913  -0.295333  -0.000013  0.208328  0.126791
3    Session_01   ETH-1       -4.000        26.000   6.018962  10.747026   16.132438   21.313884   27.780042    2.020000   37.024281  -0.696718  -0.289374  -0.000013  0.208519  0.137813
4    Session_01   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.700300  -12.210735  -18.023381  -10.170000   19.875825  -0.683938  -0.297902  -0.000002  0.209785  0.198705
5    Session_01   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.707421  -12.270781  -18.023381  -10.170000   19.875825  -0.691145  -0.358673  -0.000002  0.202726  0.086308
6    Session_01   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.700061  -12.278310  -18.023381  -10.170000   19.875825  -0.683696  -0.366292  -0.000002  0.210022  0.072215
7    Session_01   ETH-3       -4.000        26.000   5.742374  11.161270   16.684379   22.225827   28.306614    1.710000   37.450394  -0.273094  -0.216392  -0.000014  0.623472  0.270873
8    Session_01   ETH-3       -4.000        26.000   5.742374  11.161270   16.660163   22.233729   28.306614    1.710000   37.450394  -0.296906  -0.208664  -0.000014  0.600150  0.285167
9    Session_01   ETH-3       -4.000        26.000   5.742374  11.161270   16.675191   22.215632   28.306614    1.710000   37.450394  -0.282128  -0.226363  -0.000014  0.614623  0.252432
10   Session_01     FOO       -4.000        26.000  -0.840413   2.828738    1.328380    5.374933    4.665655   -5.000000   28.907344  -0.582131  -0.288924  -0.000006  0.314928  0.175105
11   Session_01     FOO       -4.000        26.000  -0.840413   2.828738    1.302220    5.384454    4.665655   -5.000000   28.907344  -0.608241  -0.279457  -0.000006  0.289356  0.192614
12   Session_01     FOO       -4.000        26.000  -0.840413   2.828738    1.322530    5.372841    4.665655   -5.000000   28.907344  -0.587970  -0.291004  -0.000006  0.309209  0.171257
13   Session_02   ETH-1       -4.000        26.000   6.018962  10.747026   16.140853   21.267202   27.780042    2.020000   37.024281  -0.688442  -0.335067  -0.000013  0.207730  0.138730
14   Session_02   ETH-1       -4.000        26.000   6.018962  10.747026   16.127087   21.256983   27.780042    2.020000   37.024281  -0.701980  -0.345071  -0.000013  0.194396  0.131311
15   Session_02   ETH-1       -4.000        26.000   6.018962  10.747026   16.148253   21.287779   27.780042    2.020000   37.024281  -0.681165  -0.314926  -0.000013  0.214898  0.153668
16   Session_02   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.715859  -12.204791  -18.023381  -10.170000   19.875825  -0.699685  -0.291887  -0.000002  0.207349  0.149128
17   Session_02   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.709763  -12.188685  -18.023381  -10.170000   19.875825  -0.693516  -0.275587  -0.000002  0.213426  0.161217
18   Session_02   ETH-2       -4.000        26.000  -5.995859  -5.976076  -12.715427  -12.253049  -18.023381  -10.170000   19.875825  -0.699249  -0.340727  -0.000002  0.207780  0.112907
19   Session_02   ETH-3       -4.000        26.000   5.742374  11.161270   16.685994   22.249463   28.306614    1.710000   37.450394  -0.271506  -0.193275  -0.000014  0.618328  0.244431
20   Session_02   ETH-3       -4.000        26.000   5.742374  11.161270   16.681351   22.298166   28.306614    1.710000   37.450394  -0.276071  -0.145641  -0.000014  0.613831  0.279758
21   Session_02   ETH-3       -4.000        26.000   5.742374  11.161270   16.676169   22.306848   28.306614    1.710000   37.450394  -0.281167  -0.137150  -0.000014  0.608813  0.286056
22   Session_02     FOO       -4.000        26.000  -0.840413   2.828738    1.324359    5.339497    4.665655   -5.000000   28.907344  -0.586144  -0.324160  -0.000006  0.314015  0.136535
23   Session_02     FOO       -4.000        26.000  -0.840413   2.828738    1.297658    5.325854    4.665655   -5.000000   28.907344  -0.612794  -0.337727  -0.000006  0.287767  0.126473
24   Session_02     FOO       -4.000        26.000  -0.840413   2.828738    1.310185    5.339898    4.665655   -5.000000   28.907344  -0.600291  -0.323761  -0.000006  0.300082  0.136830
–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––  ––––––––

3. Command-Line Interface (CLI)

Instead of writing Python code, you may directly use the CLI to process raw Δ47 and Δ48 data using reasonable defaults. The simplest way is simply to call:

D47crunch rawdata.csv

This will create a directory named output and populate it by calling the following methods:

You may specify a custom set of anchors instead of the default ones using the --anchors or -a option:

D47crunch -a anchors.csv rawdata.csv

In this case, the anchors.csv file (you may use any other file name) must have the following format:

Sample, d13C_VPDB, d18O_VPDB,    D47
 ETH-1,      2.02,     -2.19, 0.2052
 ETH-2,    -10.17,    -18.69, 0.2085
 ETH-3,      1.71,     -1.78, 0.6132
 ETH-4,          ,          , 0.4511

The samples with non-empty d13C_VPDB, d18O_VPDB, and D47 values are used to standardize δ13C, δ18O, and Δ47 values respectively.

You may also provide a list of analyses and/or samples to exclude from the input. This is done with the --exclude or -e option:

D47crunch -e badbatch.csv rawdata.csv

In this case, the badbatch.csv file (again, you may use a different file name) must have the following format:

UID, Sample
A03
A09
B06
   , MYBADSAMPLE-1
   , MYBADSAMPLE-2

This will exclude (ignore) analyses with the UIDs A03, A09, and B06, and those of samples MYBADSAMPLE-1 and MYBADSAMPLE-2. It is possible to have and exclude file with only the UID column, or only the Sample column, or both, in any order.

The --output-dir or -o option may be used to specify a custom directory name for the output. For example, in unix-like shells the following command will create a time-stamped output directory:

D47crunch -o `date "+%Y-%M-%d-%Hh%M"` rawdata.csv

To process Δ48 as well as Δ47, just add the --D48 option.

4. API Documentation

   1'''
   2Standardization and analytical error propagation of Δ47 and Δ48 clumped-isotope measurements
   3
   4Process and standardize carbonate and/or CO2 clumped-isotope analyses,
   5from low-level data out of a dual-inlet mass spectrometer to final, “absolute”
   6Δ47, Δ48 and Δ49 values with fully propagated analytical error estimates
   7([Daëron, 2021](https://doi.org/10.1029/2020GC009592)).
   8
   9The **tutorial** section takes you through a series of simple steps to import/process data and print out the results.
  10The **how-to** section provides instructions applicable to various specific tasks.
  11
  12.. include:: ../docs/tutorial.md
  13.. include:: ../docs/howto.md
  14.. include:: ../docs/cli.md
  15
  16# 4. API Documentation
  17'''
  18
  19__docformat__ = "restructuredtext"
  20__author__    = 'Mathieu Daëron'
  21__contact__   = 'daeron@lsce.ipsl.fr'
  22__copyright__ = 'Copyright (c) 2023 Mathieu Daëron'
  23__license__   = 'Modified BSD License - https://opensource.org/licenses/BSD-3-Clause'
  24__date__      = '2023-10-04'
  25__version__   = '2.4.0'
  26
  27import os
  28import numpy as np
  29import typer
  30from typing_extensions import Annotated
  31from statistics import stdev
  32from scipy.stats import t as tstudent
  33from scipy.stats import levene
  34from scipy.interpolate import interp1d
  35from numpy import linalg
  36from lmfit import Minimizer, Parameters, report_fit
  37from matplotlib import pyplot as ppl
  38from datetime import datetime as dt
  39from functools import wraps
  40from colorsys import hls_to_rgb
  41from matplotlib import rcParams
  42
  43typer.rich_utils.STYLE_HELPTEXT = ''
  44
  45rcParams['font.family'] = 'sans-serif'
  46rcParams['font.sans-serif'] = 'Helvetica'
  47rcParams['font.size'] = 10
  48rcParams['mathtext.fontset'] = 'custom'
  49rcParams['mathtext.rm'] = 'sans'
  50rcParams['mathtext.bf'] = 'sans:bold'
  51rcParams['mathtext.it'] = 'sans:italic'
  52rcParams['mathtext.cal'] = 'sans:italic'
  53rcParams['mathtext.default'] = 'rm'
  54rcParams['xtick.major.size'] = 4
  55rcParams['xtick.major.width'] = 1
  56rcParams['ytick.major.size'] = 4
  57rcParams['ytick.major.width'] = 1
  58rcParams['axes.grid'] = False
  59rcParams['axes.linewidth'] = 1
  60rcParams['grid.linewidth'] = .75
  61rcParams['grid.linestyle'] = '-'
  62rcParams['grid.alpha'] = .15
  63rcParams['savefig.dpi'] = 150
  64
  65Petersen_etal_CO2eqD47 = np.array([[-12, 1.147113572], [-11, 1.139961218], [-10, 1.132872856], [-9, 1.125847677], [-8, 1.118884889], [-7, 1.111983708], [-6, 1.105143366], [-5, 1.098363105], [-4, 1.091642182], [-3, 1.084979862], [-2, 1.078375423], [-1, 1.071828156], [0, 1.065337360], [1, 1.058902349], [2, 1.052522443], [3, 1.046196976], [4, 1.039925291], [5, 1.033706741], [6, 1.027540690], [7, 1.021426510], [8, 1.015363585], [9, 1.009351306], [10, 1.003389075], [11, 0.997476303], [12, 0.991612409], [13, 0.985796821], [14, 0.980028975], [15, 0.974308318], [16, 0.968634304], [17, 0.963006392], [18, 0.957424055], [19, 0.951886769], [20, 0.946394020], [21, 0.940945302], [22, 0.935540114], [23, 0.930177964], [24, 0.924858369], [25, 0.919580851], [26, 0.914344938], [27, 0.909150167], [28, 0.903996080], [29, 0.898882228], [30, 0.893808167], [31, 0.888773459], [32, 0.883777672], [33, 0.878820382], [34, 0.873901170], [35, 0.869019623], [36, 0.864175334], [37, 0.859367901], [38, 0.854596929], [39, 0.849862028], [40, 0.845162813], [41, 0.840498905], [42, 0.835869931], [43, 0.831275522], [44, 0.826715314], [45, 0.822188950], [46, 0.817696075], [47, 0.813236341], [48, 0.808809404], [49, 0.804414926], [50, 0.800052572], [51, 0.795722012], [52, 0.791422922], [53, 0.787154979], [54, 0.782917869], [55, 0.778711277], [56, 0.774534898], [57, 0.770388426], [58, 0.766271562], [59, 0.762184010], [60, 0.758125479], [61, 0.754095680], [62, 0.750094329], [63, 0.746121147], [64, 0.742175856], [65, 0.738258184], [66, 0.734367860], [67, 0.730504620], [68, 0.726668201], [69, 0.722858343], [70, 0.719074792], [71, 0.715317295], [72, 0.711585602], [73, 0.707879469], [74, 0.704198652], [75, 0.700542912], [76, 0.696912012], [77, 0.693305719], [78, 0.689723802], [79, 0.686166034], [80, 0.682632189], [81, 0.679122047], [82, 0.675635387], [83, 0.672171994], [84, 0.668731654], [85, 0.665314156], [86, 0.661919291], [87, 0.658546854], [88, 0.655196641], [89, 0.651868451], [90, 0.648562087], [91, 0.645277352], [92, 0.642014054], [93, 0.638771999], [94, 0.635551001], [95, 0.632350872], [96, 0.629171428], [97, 0.626012487], [98, 0.622873870], [99, 0.619755397], [100, 0.616656895], [102, 0.610519107], [104, 0.604459143], [106, 0.598475670], [108, 0.592567388], [110, 0.586733026], [112, 0.580971342], [114, 0.575281125], [116, 0.569661187], [118, 0.564110371], [120, 0.558627545], [122, 0.553211600], [124, 0.547861454], [126, 0.542576048], [128, 0.537354347], [130, 0.532195337], [132, 0.527098028], [134, 0.522061450], [136, 0.517084654], [138, 0.512166711], [140, 0.507306712], [142, 0.502503768], [144, 0.497757006], [146, 0.493065573], [148, 0.488428634], [150, 0.483845370], [152, 0.479314980], [154, 0.474836677], [156, 0.470409692], [158, 0.466033271], [160, 0.461706674], [162, 0.457429176], [164, 0.453200067], [166, 0.449018650], [168, 0.444884242], [170, 0.440796174], [172, 0.436753787], [174, 0.432756438], [176, 0.428803494], [178, 0.424894334], [180, 0.421028350], [182, 0.417204944], [184, 0.413423530], [186, 0.409683531], [188, 0.405984383], [190, 0.402325531], [192, 0.398706429], [194, 0.395126543], [196, 0.391585347], [198, 0.388082324], [200, 0.384616967], [202, 0.381188778], [204, 0.377797268], [206, 0.374441954], [208, 0.371122364], [210, 0.367838033], [212, 0.364588505], [214, 0.361373329], [216, 0.358192065], [218, 0.355044277], [220, 0.351929540], [222, 0.348847432], [224, 0.345797540], [226, 0.342779460], [228, 0.339792789], [230, 0.336837136], [232, 0.333912113], [234, 0.331017339], [236, 0.328152439], [238, 0.325317046], [240, 0.322510795], [242, 0.319733329], [244, 0.316984297], [246, 0.314263352], [248, 0.311570153], [250, 0.308904364], [252, 0.306265654], [254, 0.303653699], [256, 0.301068176], [258, 0.298508771], [260, 0.295975171], [262, 0.293467070], [264, 0.290984167], [266, 0.288526163], [268, 0.286092765], [270, 0.283683684], [272, 0.281298636], [274, 0.278937339], [276, 0.276599517], [278, 0.274284898], [280, 0.271993211], [282, 0.269724193], [284, 0.267477582], [286, 0.265253121], [288, 0.263050554], [290, 0.260869633], [292, 0.258710110], [294, 0.256571741], [296, 0.254454286], [298, 0.252357508], [300, 0.250281174], [302, 0.248225053], [304, 0.246188917], [306, 0.244172542], [308, 0.242175707], [310, 0.240198194], [312, 0.238239786], [314, 0.236300272], [316, 0.234379441], [318, 0.232477087], [320, 0.230593005], [322, 0.228726993], [324, 0.226878853], [326, 0.225048388], [328, 0.223235405], [330, 0.221439711], [332, 0.219661118], [334, 0.217899439], [336, 0.216154491], [338, 0.214426091], [340, 0.212714060], [342, 0.211018220], [344, 0.209338398], [346, 0.207674420], [348, 0.206026115], [350, 0.204393315], [355, 0.200378063], [360, 0.196456139], [365, 0.192625077], [370, 0.188882487], [375, 0.185226048], [380, 0.181653511], [385, 0.178162694], [390, 0.174751478], [395, 0.171417807], [400, 0.168159686], [405, 0.164975177], [410, 0.161862398], [415, 0.158819521], [420, 0.155844772], [425, 0.152936426], [430, 0.150092806], [435, 0.147312286], [440, 0.144593281], [445, 0.141934254], [450, 0.139333710], [455, 0.136790195], [460, 0.134302294], [465, 0.131868634], [470, 0.129487876], [475, 0.127158722], [480, 0.124879906], [485, 0.122650197], [490, 0.120468398], [495, 0.118333345], [500, 0.116243903], [505, 0.114198970], [510, 0.112197471], [515, 0.110238362], [520, 0.108320625], [525, 0.106443271], [530, 0.104605335], [535, 0.102805877], [540, 0.101043985], [545, 0.099318768], [550, 0.097629359], [555, 0.095974915], [560, 0.094354612], [565, 0.092767650], [570, 0.091213248], [575, 0.089690648], [580, 0.088199108], [585, 0.086737906], [590, 0.085306341], [595, 0.083903726], [600, 0.082529395], [605, 0.081182697], [610, 0.079862998], [615, 0.078569680], [620, 0.077302141], [625, 0.076059794], [630, 0.074842066], [635, 0.073648400], [640, 0.072478251], [645, 0.071331090], [650, 0.070206399], [655, 0.069103674], [660, 0.068022424], [665, 0.066962168], [670, 0.065922439], [675, 0.064902780], [680, 0.063902748], [685, 0.062921909], [690, 0.061959837], [695, 0.061016122], [700, 0.060090360], [705, 0.059182157], [710, 0.058291131], [715, 0.057416907], [720, 0.056559120], [725, 0.055717414], [730, 0.054891440], [735, 0.054080860], [740, 0.053285343], [745, 0.052504565], [750, 0.051738210], [755, 0.050985971], [760, 0.050247546], [765, 0.049522643], [770, 0.048810974], [775, 0.048112260], [780, 0.047426227], [785, 0.046752609], [790, 0.046091145], [795, 0.045441581], [800, 0.044803668], [805, 0.044177164], [810, 0.043561831], [815, 0.042957438], [820, 0.042363759], [825, 0.041780573], [830, 0.041207664], [835, 0.040644822], [840, 0.040091839], [845, 0.039548516], [850, 0.039014654], [855, 0.038490063], [860, 0.037974554], [865, 0.037467944], [870, 0.036970054], [875, 0.036480707], [880, 0.035999734], [885, 0.035526965], [890, 0.035062238], [895, 0.034605393], [900, 0.034156272], [905, 0.033714724], [910, 0.033280598], [915, 0.032853749], [920, 0.032434032], [925, 0.032021309], [930, 0.031615443], [935, 0.031216300], [940, 0.030823749], [945, 0.030437663], [950, 0.030057915], [955, 0.029684385], [960, 0.029316951], [965, 0.028955498], [970, 0.028599910], [975, 0.028250075], [980, 0.027905884], [985, 0.027567229], [990, 0.027234006], [995, 0.026906112], [1000, 0.026583445], [1005, 0.026265908], [1010, 0.025953405], [1015, 0.025645841], [1020, 0.025343124], [1025, 0.025045163], [1030, 0.024751871], [1035, 0.024463160], [1040, 0.024178947], [1045, 0.023899147], [1050, 0.023623680], [1055, 0.023352467], [1060, 0.023085429], [1065, 0.022822491], [1070, 0.022563577], [1075, 0.022308615], [1080, 0.022057533], [1085, 0.021810260], [1090, 0.021566729], [1095, 0.021326872], [1100, 0.021090622]])
  66_fCO2eqD47_Petersen = interp1d(Petersen_etal_CO2eqD47[:,0], Petersen_etal_CO2eqD47[:,1])
  67def fCO2eqD47_Petersen(T):
  68	'''
  69	CO2 equilibrium Δ47 value as a function of T (in degrees C)
  70	according to [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127).
  71
  72	'''
  73	return float(_fCO2eqD47_Petersen(T))
  74
  75
  76Wang_etal_CO2eqD47 = np.array([[-83., 1.8954], [-73., 1.7530], [-63., 1.6261], [-53., 1.5126], [-43., 1.4104], [-33., 1.3182], [-23., 1.2345], [-13., 1.1584], [-3., 1.0888], [7., 1.0251], [17., 0.9665], [27., 0.9125], [37., 0.8626], [47., 0.8164], [57., 0.7734], [67., 0.7334], [87., 0.6612], [97., 0.6286], [107., 0.5980], [117., 0.5693], [127., 0.5423], [137., 0.5169], [147., 0.4930], [157., 0.4704], [167., 0.4491], [177., 0.4289], [187., 0.4098], [197., 0.3918], [207., 0.3747], [217., 0.3585], [227., 0.3431], [237., 0.3285], [247., 0.3147], [257., 0.3015], [267., 0.2890], [277., 0.2771], [287., 0.2657], [297., 0.2550], [307., 0.2447], [317., 0.2349], [327., 0.2256], [337., 0.2167], [347., 0.2083], [357., 0.2002], [367., 0.1925], [377., 0.1851], [387., 0.1781], [397., 0.1714], [407., 0.1650], [417., 0.1589], [427., 0.1530], [437., 0.1474], [447., 0.1421], [457., 0.1370], [467., 0.1321], [477., 0.1274], [487., 0.1229], [497., 0.1186], [507., 0.1145], [517., 0.1105], [527., 0.1068], [537., 0.1031], [547., 0.0997], [557., 0.0963], [567., 0.0931], [577., 0.0901], [587., 0.0871], [597., 0.0843], [607., 0.0816], [617., 0.0790], [627., 0.0765], [637., 0.0741], [647., 0.0718], [657., 0.0695], [667., 0.0674], [677., 0.0654], [687., 0.0634], [697., 0.0615], [707., 0.0597], [717., 0.0579], [727., 0.0562], [737., 0.0546], [747., 0.0530], [757., 0.0515], [767., 0.0500], [777., 0.0486], [787., 0.0472], [797., 0.0459], [807., 0.0447], [817., 0.0435], [827., 0.0423], [837., 0.0411], [847., 0.0400], [857., 0.0390], [867., 0.0380], [877., 0.0370], [887., 0.0360], [897., 0.0351], [907., 0.0342], [917., 0.0333], [927., 0.0325], [937., 0.0317], [947., 0.0309], [957., 0.0302], [967., 0.0294], [977., 0.0287], [987., 0.0281], [997., 0.0274], [1007., 0.0268], [1017., 0.0261], [1027., 0.0255], [1037., 0.0249], [1047., 0.0244], [1057., 0.0238], [1067., 0.0233], [1077., 0.0228], [1087., 0.0223], [1097., 0.0218]])
  77_fCO2eqD47_Wang = interp1d(Wang_etal_CO2eqD47[:,0] - 0.15, Wang_etal_CO2eqD47[:,1])
  78def fCO2eqD47_Wang(T):
  79	'''
  80	CO2 equilibrium Δ47 value as a function of `T` (in degrees C)
  81	according to [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)
  82	(supplementary data of [Dennis et al., 2011](https://doi.org/10.1016/j.gca.2011.09.025)).
  83	'''
  84	return float(_fCO2eqD47_Wang(T))
  85
  86
  87def correlated_sum(X, C, w = None):
  88	'''
  89	Compute covariance-aware linear combinations
  90
  91	**Parameters**
  92	
  93	+ `X`: list or 1-D array of values to sum
  94	+ `C`: covariance matrix for the elements of `X`
  95	+ `w`: list or 1-D array of weights to apply to the elements of `X`
  96	       (all equal to 1 by default)
  97
  98	Return the sum (and its SE) of the elements of `X`, with optional weights equal
  99	to the elements of `w`, accounting for covariances between the elements of `X`.
 100	'''
 101	if w is None:
 102		w = [1 for x in X]
 103	return np.dot(w,X), (np.dot(w,np.dot(C,w)))**.5
 104
 105
 106def make_csv(x, hsep = ',', vsep = '\n'):
 107	'''
 108	Formats a list of lists of strings as a CSV
 109
 110	**Parameters**
 111
 112	+ `x`: the list of lists of strings to format
 113	+ `hsep`: the field separator (`,` by default)
 114	+ `vsep`: the line-ending convention to use (`\\n` by default)
 115
 116	**Example**
 117
 118	```py
 119	print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']]))
 120	```
 121
 122	outputs:
 123
 124	```py
 125	a,b,c
 126	d,e,f
 127	```
 128	'''
 129	return vsep.join([hsep.join(l) for l in x])
 130
 131
 132def pf(txt):
 133	'''
 134	Modify string `txt` to follow `lmfit.Parameter()` naming rules.
 135	'''
 136	return txt.replace('-','_').replace('.','_').replace(' ','_')
 137
 138
 139def smart_type(x):
 140	'''
 141	Tries to convert string `x` to a float if it includes a decimal point, or
 142	to an integer if it does not. If both attempts fail, return the original
 143	string unchanged.
 144	'''
 145	try:
 146		y = float(x)
 147	except ValueError:
 148		return x
 149	if '.' not in x:
 150		return int(y)
 151	return y
 152
 153
 154def pretty_table(x, header = 1, hsep = '  ', vsep = '–', align = '<'):
 155	'''
 156	Reads a list of lists of strings and outputs an ascii table
 157
 158	**Parameters**
 159
 160	+ `x`: a list of lists of strings
 161	+ `header`: the number of lines to treat as header lines
 162	+ `hsep`: the horizontal separator between columns
 163	+ `vsep`: the character to use as vertical separator
 164	+ `align`: string of left (`<`) or right (`>`) alignment characters.
 165
 166	**Example**
 167
 168	```py
 169	x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']]
 170	print(pretty_table(x))
 171	```
 172	yields:	
 173	```
 174	--  ------  ---
 175	A        B    C
 176	--  ------  ---
 177	1   1.9999  foo
 178	10       x  bar
 179	--  ------  ---
 180	```
 181	
 182	'''
 183	txt = []
 184	widths = [np.max([len(e) for e in c]) for c in zip(*x)]
 185
 186	if len(widths) > len(align):
 187		align += '>' * (len(widths)-len(align))
 188	sepline = hsep.join([vsep*w for w in widths])
 189	txt += [sepline]
 190	for k,l in enumerate(x):
 191		if k and k == header:
 192			txt += [sepline]
 193		txt += [hsep.join([f'{e:{a}{w}}' for e, w, a in zip(l, widths, align)])]
 194	txt += [sepline]
 195	txt += ['']
 196	return '\n'.join(txt)
 197
 198
 199def transpose_table(x):
 200	'''
 201	Transpose a list if lists
 202
 203	**Parameters**
 204
 205	+ `x`: a list of lists
 206
 207	**Example**
 208
 209	```py
 210	x = [[1, 2], [3, 4]]
 211	print(transpose_table(x)) # yields: [[1, 3], [2, 4]]
 212	```
 213	'''
 214	return [[e for e in c] for c in zip(*x)]
 215
 216
 217def w_avg(X, sX) :
 218	'''
 219	Compute variance-weighted average
 220
 221	Returns the value and SE of the weighted average of the elements of `X`,
 222	with relative weights equal to their inverse variances (`1/sX**2`).
 223
 224	**Parameters**
 225
 226	+ `X`: array-like of elements to average
 227	+ `sX`: array-like of the corresponding SE values
 228
 229	**Tip**
 230
 231	If `X` and `sX` are initially arranged as a list of `(x, sx)` doublets,
 232	they may be rearranged using `zip()`:
 233
 234	```python
 235	foo = [(0, 1), (1, 0.5), (2, 0.5)]
 236	print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333)
 237	```
 238	'''
 239	X = [ x for x in X ]
 240	sX = [ sx for sx in sX ]
 241	W = [ sx**-2 for sx in sX ]
 242	W = [ w/sum(W) for w in W ]
 243	Xavg = sum([ w*x for w,x in zip(W,X) ])
 244	sXavg = sum([ w**2*sx**2 for w,sx in zip(W,sX) ])**.5
 245	return Xavg, sXavg
 246
 247
 248def read_csv(filename, sep = ''):
 249	'''
 250	Read contents of `filename` in csv format and return a list of dictionaries.
 251
 252	In the csv string, spaces before and after field separators (`','` by default)
 253	are optional.
 254
 255	**Parameters**
 256
 257	+ `filename`: the csv file to read
 258	+ `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`,
 259	whichever appers most often in the contents of `filename`.
 260	'''
 261	with open(filename) as fid:
 262		txt = fid.read()
 263
 264	if sep == '':
 265		sep = sorted(',;\t', key = lambda x: - txt.count(x))[0]
 266	txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()]
 267	return [{k: smart_type(v) for k,v in zip(txt[0], l) if v} for l in txt[1:]]
 268
 269
 270def simulate_single_analysis(
 271	sample = 'MYSAMPLE',
 272	d13Cwg_VPDB = -4., d18Owg_VSMOW = 26.,
 273	d13C_VPDB = None, d18O_VPDB = None,
 274	D47 = None, D48 = None, D49 = 0., D17O = 0.,
 275	a47 = 1., b47 = 0., c47 = -0.9,
 276	a48 = 1., b48 = 0., c48 = -0.45,
 277	Nominal_D47 = None,
 278	Nominal_D48 = None,
 279	Nominal_d13C_VPDB = None,
 280	Nominal_d18O_VPDB = None,
 281	ALPHA_18O_ACID_REACTION = None,
 282	R13_VPDB = None,
 283	R17_VSMOW = None,
 284	R18_VSMOW = None,
 285	LAMBDA_17 = None,
 286	R18_VPDB = None,
 287	):
 288	'''
 289	Compute working-gas delta values for a single analysis, assuming a stochastic working
 290	gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values).
 291	
 292	**Parameters**
 293
 294	+ `sample`: sample name
 295	+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas
 296		(respectively –4 and +26 ‰ by default)
 297	+ `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample
 298	+ `D47`, `D48`, `D49`, `D17O`: clumped-isotope and oxygen-17 anomalies
 299		of the carbonate sample
 300	+ `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and
 301		Δ48 values if `D47` or `D48` are not specified
 302	+ `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and
 303		δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified
 304	+ `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor
 305	+ `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17
 306		correction parameters (by default equal to the `D4xdata` default values)
 307	
 308	Returns a dictionary with fields
 309	`['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49']`.
 310	'''
 311
 312	if Nominal_d13C_VPDB is None:
 313		Nominal_d13C_VPDB = D4xdata().Nominal_d13C_VPDB
 314
 315	if Nominal_d18O_VPDB is None:
 316		Nominal_d18O_VPDB = D4xdata().Nominal_d18O_VPDB
 317
 318	if ALPHA_18O_ACID_REACTION is None:
 319		ALPHA_18O_ACID_REACTION = D4xdata().ALPHA_18O_ACID_REACTION
 320
 321	if R13_VPDB is None:
 322		R13_VPDB = D4xdata().R13_VPDB
 323
 324	if R17_VSMOW is None:
 325		R17_VSMOW = D4xdata().R17_VSMOW
 326
 327	if R18_VSMOW is None:
 328		R18_VSMOW = D4xdata().R18_VSMOW
 329
 330	if LAMBDA_17 is None:
 331		LAMBDA_17 = D4xdata().LAMBDA_17
 332
 333	if R18_VPDB is None:
 334		R18_VPDB = D4xdata().R18_VPDB
 335	
 336	R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW) ** LAMBDA_17
 337	
 338	if Nominal_D47 is None:
 339		Nominal_D47 = D47data().Nominal_D47
 340
 341	if Nominal_D48 is None:
 342		Nominal_D48 = D48data().Nominal_D48
 343	
 344	if d13C_VPDB is None:
 345		if sample in Nominal_d13C_VPDB:
 346			d13C_VPDB = Nominal_d13C_VPDB[sample]
 347		else:
 348			raise KeyError(f"Sample {sample} is missing d13C_VDP value, and it is not defined in Nominal_d13C_VDP.")
 349
 350	if d18O_VPDB is None:
 351		if sample in Nominal_d18O_VPDB:
 352			d18O_VPDB = Nominal_d18O_VPDB[sample]
 353		else:
 354			raise KeyError(f"Sample {sample} is missing d18O_VPDB value, and it is not defined in Nominal_d18O_VPDB.")
 355
 356	if D47 is None:
 357		if sample in Nominal_D47:
 358			D47 = Nominal_D47[sample]
 359		else:
 360			raise KeyError(f"Sample {sample} is missing D47 value, and it is not defined in Nominal_D47.")
 361
 362	if D48 is None:
 363		if sample in Nominal_D48:
 364			D48 = Nominal_D48[sample]
 365		else:
 366			raise KeyError(f"Sample {sample} is missing D48 value, and it is not defined in Nominal_D48.")
 367
 368	X = D4xdata()
 369	X.R13_VPDB = R13_VPDB
 370	X.R17_VSMOW = R17_VSMOW
 371	X.R18_VSMOW = R18_VSMOW
 372	X.LAMBDA_17 = LAMBDA_17
 373	X.R18_VPDB = R18_VPDB
 374	X.R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW)**LAMBDA_17
 375
 376	R45wg, R46wg, R47wg, R48wg, R49wg = X.compute_isobar_ratios(
 377		R13 = R13_VPDB * (1 + d13Cwg_VPDB/1000),
 378		R18 = R18_VSMOW * (1 + d18Owg_VSMOW/1000),
 379		)
 380	R45, R46, R47, R48, R49 = X.compute_isobar_ratios(
 381		R13 = R13_VPDB * (1 + d13C_VPDB/1000),
 382		R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION,
 383		D17O=D17O, D47=D47, D48=D48, D49=D49,
 384		)
 385	R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = X.compute_isobar_ratios(
 386		R13 = R13_VPDB * (1 + d13C_VPDB/1000),
 387		R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION,
 388		D17O=D17O,
 389		)
 390	
 391	d45 = 1000 * (R45/R45wg - 1)
 392	d46 = 1000 * (R46/R46wg - 1)
 393	d47 = 1000 * (R47/R47wg - 1)
 394	d48 = 1000 * (R48/R48wg - 1)
 395	d49 = 1000 * (R49/R49wg - 1)
 396
 397	for k in range(3): # dumb iteration to adjust for small changes in d47
 398		R47raw = (1 + (a47 * D47 + b47 * d47 + c47)/1000) * R47stoch
 399		R48raw = (1 + (a48 * D48 + b48 * d48 + c48)/1000) * R48stoch	
 400		d47 = 1000 * (R47raw/R47wg - 1)
 401		d48 = 1000 * (R48raw/R48wg - 1)
 402
 403	return dict(
 404		Sample = sample,
 405		D17O = D17O,
 406		d13Cwg_VPDB = d13Cwg_VPDB,
 407		d18Owg_VSMOW = d18Owg_VSMOW,
 408		d45 = d45,
 409		d46 = d46,
 410		d47 = d47,
 411		d48 = d48,
 412		d49 = d49,
 413		)
 414
 415
 416def virtual_data(
 417	samples = [],
 418	a47 = 1., b47 = 0., c47 = -0.9,
 419	a48 = 1., b48 = 0., c48 = -0.45,
 420	rd45 = 0.020, rd46 = 0.060,
 421	rD47 = 0.015, rD48 = 0.045,
 422	d13Cwg_VPDB = None, d18Owg_VSMOW = None,
 423	session = None,
 424	Nominal_D47 = None, Nominal_D48 = None,
 425	Nominal_d13C_VPDB = None, Nominal_d18O_VPDB = None,
 426	ALPHA_18O_ACID_REACTION = None,
 427	R13_VPDB = None,
 428	R17_VSMOW = None,
 429	R18_VSMOW = None,
 430	LAMBDA_17 = None,
 431	R18_VPDB = None,
 432	seed = 0,
 433	shuffle = True,
 434	):
 435	'''
 436	Return list with simulated analyses from a single session.
 437	
 438	**Parameters**
 439	
 440	+ `samples`: a list of entries; each entry is a dictionary with the following fields:
 441	    * `Sample`: the name of the sample
 442	    * `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample
 443	    * `D47`, `D48`, `D49`, `D17O` (all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sample
 444	    * `N`: how many analyses to generate for this sample
 445	+ `a47`: scrambling factor for Δ47
 446	+ `b47`: compositional nonlinearity for Δ47
 447	+ `c47`: working gas offset for Δ47
 448	+ `a48`: scrambling factor for Δ48
 449	+ `b48`: compositional nonlinearity for Δ48
 450	+ `c48`: working gas offset for Δ48
 451	+ `rd45`: analytical repeatability of δ45
 452	+ `rd46`: analytical repeatability of δ46
 453	+ `rD47`: analytical repeatability of Δ47
 454	+ `rD48`: analytical repeatability of Δ48
 455	+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas
 456		(by default equal to the `simulate_single_analysis` default values)
 457	+ `session`: name of the session (no name by default)
 458	+ `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and Δ48 values
 459		if `D47` or `D48` are not specified (by default equal to the `simulate_single_analysis` defaults)
 460	+ `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and
 461		δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 
 462		(by default equal to the `simulate_single_analysis` defaults)
 463	+ `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor
 464		(by default equal to the `simulate_single_analysis` defaults)
 465	+ `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17
 466		correction parameters (by default equal to the `simulate_single_analysis` default)
 467	+ `seed`: explicitly set to a non-zero value to achieve random but repeatable simulations
 468	+ `shuffle`: randomly reorder the sequence of analyses
 469	
 470		
 471	Here is an example of using this method to generate an arbitrary combination of
 472	anchors and unknowns for a bunch of sessions:
 473
 474	```py
 475	.. include:: ../code_examples/virtual_data/example.py
 476	```
 477	
 478	This should output something like:
 479	
 480	```
 481	.. include:: ../code_examples/virtual_data/output.txt
 482	```
 483	'''
 484	
 485	kwargs = locals().copy()
 486
 487	from numpy import random as nprandom
 488	if seed:
 489		rng = nprandom.default_rng(seed)
 490	else:
 491		rng = nprandom.default_rng()
 492	
 493	N = sum([s['N'] for s in samples])
 494	errors45 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
 495	errors45 *= rd45 / stdev(errors45) # scale errors to rd45
 496	errors46 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
 497	errors46 *= rd46 / stdev(errors46) # scale errors to rd46
 498	errors47 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
 499	errors47 *= rD47 / stdev(errors47) # scale errors to rD47
 500	errors48 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
 501	errors48 *= rD48 / stdev(errors48) # scale errors to rD48
 502	
 503	k = 0
 504	out = []
 505	for s in samples:
 506		kw = {}
 507		kw['sample'] = s['Sample']
 508		kw = {
 509			**kw,
 510			**{var: kwargs[var]
 511				for var in [
 512					'd13Cwg_VPDB', 'd18Owg_VSMOW', 'ALPHA_18O_ACID_REACTION',
 513					'Nominal_D47', 'Nominal_D48', 'Nominal_d13C_VPDB', 'Nominal_d18O_VPDB',
 514					'R13_VPDB', 'R17_VSMOW', 'R18_VSMOW', 'LAMBDA_17', 'R18_VPDB',
 515					'a47', 'b47', 'c47', 'a48', 'b48', 'c48',
 516					]
 517				if kwargs[var] is not None},
 518			**{var: s[var]
 519				for var in ['d13C_VPDB', 'd18O_VPDB', 'D47', 'D48', 'D49', 'D17O']
 520				if var in s},
 521			}
 522
 523		sN = s['N']
 524		while sN:
 525			out.append(simulate_single_analysis(**kw))
 526			out[-1]['d45'] += errors45[k]
 527			out[-1]['d46'] += errors46[k]
 528			out[-1]['d47'] += (errors45[k] + errors46[k] + errors47[k]) * a47
 529			out[-1]['d48'] += (2*errors46[k] + errors48[k]) * a48
 530			sN -= 1
 531			k += 1
 532
 533		if session is not None:
 534			for r in out:
 535				r['Session'] = session
 536
 537		if shuffle:
 538			nprandom.shuffle(out)
 539
 540	return out
 541
 542def table_of_samples(
 543	data47 = None,
 544	data48 = None,
 545	dir = 'output',
 546	filename = None,
 547	save_to_file = True,
 548	print_out = True,
 549	output = None,
 550	):
 551	'''
 552	Print out, save to disk and/or return a combined table of samples
 553	for a pair of `D47data` and `D48data` objects.
 554
 555	**Parameters**
 556
 557	+ `data47`: `D47data` instance
 558	+ `data48`: `D48data` instance
 559	+ `dir`: the directory in which to save the table
 560	+ `filename`: the name to the csv file to write to
 561	+ `save_to_file`: whether to save the table to disk
 562	+ `print_out`: whether to print out the table
 563	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
 564		if set to `'raw'`: return a list of list of strings
 565		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
 566	'''
 567	if data47 is None:
 568		if data48 is None:
 569			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
 570		else:
 571			return data48.table_of_samples(
 572				dir = dir,
 573				filename = filename,
 574				save_to_file = save_to_file,
 575				print_out = print_out,
 576				output = output
 577				)
 578	else:
 579		if data48 is None:
 580			return data47.table_of_samples(
 581				dir = dir,
 582				filename = filename,
 583				save_to_file = save_to_file,
 584				print_out = print_out,
 585				output = output
 586				)
 587		else:
 588			out47 = data47.table_of_samples(save_to_file = False, print_out = False, output = 'raw')
 589			out48 = data48.table_of_samples(save_to_file = False, print_out = False, output = 'raw')
 590			out = transpose_table(transpose_table(out47) + transpose_table(out48)[4:])
 591
 592			if save_to_file:
 593				if not os.path.exists(dir):
 594					os.makedirs(dir)
 595				if filename is None:
 596					filename = f'D47D48_samples.csv'
 597				with open(f'{dir}/{filename}', 'w') as fid:
 598					fid.write(make_csv(out))
 599			if print_out:
 600				print('\n'+pretty_table(out))
 601			if output == 'raw':
 602				return out
 603			elif output == 'pretty':
 604				return pretty_table(out)
 605
 606
 607def table_of_sessions(
 608	data47 = None,
 609	data48 = None,
 610	dir = 'output',
 611	filename = None,
 612	save_to_file = True,
 613	print_out = True,
 614	output = None,
 615	):
 616	'''
 617	Print out, save to disk and/or return a combined table of sessions
 618	for a pair of `D47data` and `D48data` objects.
 619	***Only applicable if the sessions in `data47` and those in `data48`
 620	consist of the exact same sets of analyses.***
 621
 622	**Parameters**
 623
 624	+ `data47`: `D47data` instance
 625	+ `data48`: `D48data` instance
 626	+ `dir`: the directory in which to save the table
 627	+ `filename`: the name to the csv file to write to
 628	+ `save_to_file`: whether to save the table to disk
 629	+ `print_out`: whether to print out the table
 630	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
 631		if set to `'raw'`: return a list of list of strings
 632		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
 633	'''
 634	if data47 is None:
 635		if data48 is None:
 636			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
 637		else:
 638			return data48.table_of_sessions(
 639				dir = dir,
 640				filename = filename,
 641				save_to_file = save_to_file,
 642				print_out = print_out,
 643				output = output
 644				)
 645	else:
 646		if data48 is None:
 647			return data47.table_of_sessions(
 648				dir = dir,
 649				filename = filename,
 650				save_to_file = save_to_file,
 651				print_out = print_out,
 652				output = output
 653				)
 654		else:
 655			out47 = data47.table_of_sessions(save_to_file = False, print_out = False, output = 'raw')
 656			out48 = data48.table_of_sessions(save_to_file = False, print_out = False, output = 'raw')
 657			for k,x in enumerate(out47[0]):
 658				if k>7:
 659					out47[0][k] = out47[0][k].replace('a', 'a_47').replace('b', 'b_47').replace('c', 'c_47')
 660					out48[0][k] = out48[0][k].replace('a', 'a_48').replace('b', 'b_48').replace('c', 'c_48')
 661			out = transpose_table(transpose_table(out47) + transpose_table(out48)[7:])
 662
 663			if save_to_file:
 664				if not os.path.exists(dir):
 665					os.makedirs(dir)
 666				if filename is None:
 667					filename = f'D47D48_sessions.csv'
 668				with open(f'{dir}/{filename}', 'w') as fid:
 669					fid.write(make_csv(out))
 670			if print_out:
 671				print('\n'+pretty_table(out))
 672			if output == 'raw':
 673				return out
 674			elif output == 'pretty':
 675				return pretty_table(out)
 676
 677
 678def table_of_analyses(
 679	data47 = None,
 680	data48 = None,
 681	dir = 'output',
 682	filename = None,
 683	save_to_file = True,
 684	print_out = True,
 685	output = None,
 686	):
 687	'''
 688	Print out, save to disk and/or return a combined table of analyses
 689	for a pair of `D47data` and `D48data` objects.
 690
 691	If the sessions in `data47` and those in `data48` do not consist of
 692	the exact same sets of analyses, the table will have two columns
 693	`Session_47` and `Session_48` instead of a single `Session` column.
 694
 695	**Parameters**
 696
 697	+ `data47`: `D47data` instance
 698	+ `data48`: `D48data` instance
 699	+ `dir`: the directory in which to save the table
 700	+ `filename`: the name to the csv file to write to
 701	+ `save_to_file`: whether to save the table to disk
 702	+ `print_out`: whether to print out the table
 703	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
 704		if set to `'raw'`: return a list of list of strings
 705		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
 706	'''
 707	if data47 is None:
 708		if data48 is None:
 709			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
 710		else:
 711			return data48.table_of_analyses(
 712				dir = dir,
 713				filename = filename,
 714				save_to_file = save_to_file,
 715				print_out = print_out,
 716				output = output
 717				)
 718	else:
 719		if data48 is None:
 720			return data47.table_of_analyses(
 721				dir = dir,
 722				filename = filename,
 723				save_to_file = save_to_file,
 724				print_out = print_out,
 725				output = output
 726				)
 727		else:
 728			out47 = data47.table_of_analyses(save_to_file = False, print_out = False, output = 'raw')
 729			out48 = data48.table_of_analyses(save_to_file = False, print_out = False, output = 'raw')
 730			
 731			if [l[1] for l in out47[1:]] == [l[1] for l in out48[1:]]: # if sessions are identical
 732				out = transpose_table(transpose_table(out47) + transpose_table(out48)[-1:])
 733			else:
 734				out47[0][1] = 'Session_47'
 735				out48[0][1] = 'Session_48'
 736				out47 = transpose_table(out47)
 737				out48 = transpose_table(out48)
 738				out = transpose_table(out47[:2] + out48[1:2] + out47[2:] + out48[-1:])
 739
 740			if save_to_file:
 741				if not os.path.exists(dir):
 742					os.makedirs(dir)
 743				if filename is None:
 744					filename = f'D47D48_sessions.csv'
 745				with open(f'{dir}/{filename}', 'w') as fid:
 746					fid.write(make_csv(out))
 747			if print_out:
 748				print('\n'+pretty_table(out))
 749			if output == 'raw':
 750				return out
 751			elif output == 'pretty':
 752				return pretty_table(out)
 753
 754
 755def _fullcovar(minresult, epsilon = 0.01, named = False):
 756	'''
 757	Construct full covariance matrix in the case of constrained parameters
 758	'''
 759	
 760	import asteval
 761	
 762	def f(values):
 763		interp = asteval.Interpreter()
 764		for n,v in zip(minresult.var_names, values):
 765			interp(f'{n} = {v}')
 766		for q in minresult.params:
 767			if minresult.params[q].expr:
 768				interp(f'{q} = {minresult.params[q].expr}')
 769		return np.array([interp.symtable[q] for q in minresult.params])
 770
 771	# construct Jacobian
 772	J = np.zeros((minresult.nvarys, len(minresult.params)))
 773	X = np.array([minresult.params[p].value for p in minresult.var_names])
 774	sX = np.array([minresult.params[p].stderr for p in minresult.var_names])
 775
 776	for j in range(minresult.nvarys):
 777		x1 = [_ for _ in X]
 778		x1[j] += epsilon * sX[j]
 779		x2 = [_ for _ in X]
 780		x2[j] -= epsilon * sX[j]
 781		J[j,:] = (f(x1) - f(x2)) / (2 * epsilon * sX[j])
 782
 783	_names = [q for q in minresult.params]
 784	_covar = J.T @ minresult.covar @ J
 785	_se = np.diag(_covar)**.5
 786	_correl = _covar.copy()
 787	for k,s in enumerate(_se):
 788		if s:
 789			_correl[k,:] /= s
 790			_correl[:,k] /= s
 791
 792	if named:
 793		_covar = {i: {j:_covar[i,j] for j in minresult.params} for i in minresult.params}
 794		_se = {i: _se[i] for i in minresult.params}
 795		_correl = {i: {j:_correl[i,j] for j in minresult.params} for i in minresult.params}
 796
 797	return _names, _covar, _se, _correl
 798
 799
 800class D4xdata(list):
 801	'''
 802	Store and process data for a large set of Δ47 and/or Δ48
 803	analyses, usually comprising more than one analytical session.
 804	'''
 805
 806	### 17O CORRECTION PARAMETERS
 807	R13_VPDB = 0.01118  # (Chang & Li, 1990)
 808	'''
 809	Absolute (13C/12C) ratio of VPDB.
 810	By default equal to 0.01118 ([Chang & Li, 1990](http://www.cnki.com.cn/Article/CJFDTotal-JXTW199004006.htm))
 811	'''
 812
 813	R18_VSMOW = 0.0020052  # (Baertschi, 1976)
 814	'''
 815	Absolute (18O/16C) ratio of VSMOW.
 816	By default equal to 0.0020052 ([Baertschi, 1976](https://doi.org/10.1016/0012-821X(76)90115-1))
 817	'''
 818
 819	LAMBDA_17 = 0.528  # (Barkan & Luz, 2005)
 820	'''
 821	Mass-dependent exponent for triple oxygen isotopes.
 822	By default equal to 0.528 ([Barkan & Luz, 2005](https://doi.org/10.1002/rcm.2250))
 823	'''
 824
 825	R17_VSMOW = 0.00038475  # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB)
 826	'''
 827	Absolute (17O/16C) ratio of VSMOW.
 828	By default equal to 0.00038475
 829	([Assonov & Brenninkmeijer, 2003](https://dx.doi.org/10.1002/rcm.1011),
 830	rescaled to `R13_VPDB`)
 831	'''
 832
 833	R18_VPDB = R18_VSMOW * 1.03092
 834	'''
 835	Absolute (18O/16C) ratio of VPDB.
 836	By definition equal to `R18_VSMOW * 1.03092`.
 837	'''
 838
 839	R17_VPDB = R17_VSMOW * 1.03092 ** LAMBDA_17
 840	'''
 841	Absolute (17O/16C) ratio of VPDB.
 842	By definition equal to `R17_VSMOW * 1.03092 ** LAMBDA_17`.
 843	'''
 844
 845	LEVENE_REF_SAMPLE = 'ETH-3'
 846	'''
 847	After the Δ4x standardization step, each sample is tested to
 848	assess whether the Δ4x variance within all analyses for that
 849	sample differs significantly from that observed for a given reference
 850	sample (using [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test),
 851	which yields a p-value corresponding to the null hypothesis that the
 852	underlying variances are equal).
 853
 854	`LEVENE_REF_SAMPLE` (by default equal to `'ETH-3'`) specifies which
 855	sample should be used as a reference for this test.
 856	'''
 857
 858	ALPHA_18O_ACID_REACTION = round(np.exp(3.59 / (90 + 273.15) - 1.79e-3), 6)  # (Kim et al., 2007, calcite)
 859	'''
 860	Specifies the 18O/16O fractionation factor generally applicable
 861	to acid reactions in the dataset. Currently used by `D4xdata.wg()`,
 862	`D4xdata.standardize_d13C`, and `D4xdata.standardize_d18O`.
 863
 864	By default equal to 1.008129 (calcite reacted at 90 °C,
 865	[Kim et al., 2007](https://dx.doi.org/10.1016/j.chemgeo.2007.08.005)).
 866	'''
 867
 868	Nominal_d13C_VPDB = {
 869		'ETH-1': 2.02,
 870		'ETH-2': -10.17,
 871		'ETH-3': 1.71,
 872		}	# (Bernasconi et al., 2018)
 873	'''
 874	Nominal δ13C_VPDB values assigned to carbonate standards, used by
 875	`D4xdata.standardize_d13C()`.
 876
 877	By default equal to `{'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}` after
 878	[Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385).
 879	'''
 880
 881	Nominal_d18O_VPDB = {
 882		'ETH-1': -2.19,
 883		'ETH-2': -18.69,
 884		'ETH-3': -1.78,
 885		}	# (Bernasconi et al., 2018)
 886	'''
 887	Nominal δ18O_VPDB values assigned to carbonate standards, used by
 888	`D4xdata.standardize_d18O()`.
 889
 890	By default equal to `{'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}` after
 891	[Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385).
 892	'''
 893
 894	d13C_STANDARDIZATION_METHOD = '2pt'
 895	'''
 896	Method by which to standardize δ13C values:
 897	
 898	+ `none`: do not apply any δ13C standardization.
 899	+ `'1pt'`: within each session, offset all initial δ13C values so as to
 900	minimize the difference between final δ13C_VPDB values and
 901	`Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` is defined).
 902	+ `'2pt'`: within each session, apply a affine trasformation to all δ13C
 903	values so as to minimize the difference between final δ13C_VPDB
 904	values and `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB`
 905	is defined).
 906	'''
 907
 908	d18O_STANDARDIZATION_METHOD = '2pt'
 909	'''
 910	Method by which to standardize δ18O values:
 911	
 912	+ `none`: do not apply any δ18O standardization.
 913	+ `'1pt'`: within each session, offset all initial δ18O values so as to
 914	minimize the difference between final δ18O_VPDB values and
 915	`Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` is defined).
 916	+ `'2pt'`: within each session, apply a affine trasformation to all δ18O
 917	values so as to minimize the difference between final δ18O_VPDB
 918	values and `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB`
 919	is defined).
 920	'''
 921
 922	def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False):
 923		'''
 924		**Parameters**
 925
 926		+ `l`: a list of dictionaries, with each dictionary including at least the keys
 927		`Sample`, `d45`, `d46`, and `d47` or `d48`.
 928		+ `mass`: `'47'` or `'48'`
 929		+ `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods.
 930		+ `session`: define session name for analyses without a `Session` key
 931		+ `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods.
 932
 933		Returns a `D4xdata` object derived from `list`.
 934		'''
 935		self._4x = mass
 936		self.verbose = verbose
 937		self.prefix = 'D4xdata'
 938		self.logfile = logfile
 939		list.__init__(self, l)
 940		self.Nf = None
 941		self.repeatability = {}
 942		self.refresh(session = session)
 943
 944
 945	def make_verbal(oldfun):
 946		'''
 947		Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`.
 948		'''
 949		@wraps(oldfun)
 950		def newfun(*args, verbose = '', **kwargs):
 951			myself = args[0]
 952			oldprefix = myself.prefix
 953			myself.prefix = oldfun.__name__
 954			if verbose != '':
 955				oldverbose = myself.verbose
 956				myself.verbose = verbose
 957			out = oldfun(*args, **kwargs)
 958			myself.prefix = oldprefix
 959			if verbose != '':
 960				myself.verbose = oldverbose
 961			return out
 962		return newfun
 963
 964
 965	def msg(self, txt):
 966		'''
 967		Log a message to `self.logfile`, and print it out if `verbose = True`
 968		'''
 969		self.log(txt)
 970		if self.verbose:
 971			print(f'{f"[{self.prefix}]":<16} {txt}')
 972
 973
 974	def vmsg(self, txt):
 975		'''
 976		Log a message to `self.logfile` and print it out
 977		'''
 978		self.log(txt)
 979		print(txt)
 980
 981
 982	def log(self, *txts):
 983		'''
 984		Log a message to `self.logfile`
 985		'''
 986		if self.logfile:
 987			with open(self.logfile, 'a') as fid:
 988				for txt in txts:
 989					fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}')
 990
 991
 992	def refresh(self, session = 'mySession'):
 993		'''
 994		Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`.
 995		'''
 996		self.fill_in_missing_info(session = session)
 997		self.refresh_sessions()
 998		self.refresh_samples()
 999
1000
1001	def refresh_sessions(self):
1002		'''
1003		Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift`
1004		to `False` for all sessions.
1005		'''
1006		self.sessions = {
1007			s: {'data': [r for r in self if r['Session'] == s]}
1008			for s in sorted({r['Session'] for r in self})
1009			}
1010		for s in self.sessions:
1011			self.sessions[s]['scrambling_drift'] = False
1012			self.sessions[s]['slope_drift'] = False
1013			self.sessions[s]['wg_drift'] = False
1014			self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD
1015			self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD
1016
1017
1018	def refresh_samples(self):
1019		'''
1020		Define `self.samples`, `self.anchors`, and `self.unknowns`.
1021		'''
1022		self.samples = {
1023			s: {'data': [r for r in self if r['Sample'] == s]}
1024			for s in sorted({r['Sample'] for r in self})
1025			}
1026		self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x}
1027		self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x}
1028
1029
1030	def read(self, filename, sep = '', session = ''):
1031		'''
1032		Read file in csv format to load data into a `D47data` object.
1033
1034		In the csv file, spaces before and after field separators (`','` by default)
1035		are optional. Each line corresponds to a single analysis.
1036
1037		The required fields are:
1038
1039		+ `UID`: a unique identifier
1040		+ `Session`: an identifier for the analytical session
1041		+ `Sample`: a sample identifier
1042		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1043
1044		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1045		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1046		and `d49` are optional, and set to NaN by default.
1047
1048		**Parameters**
1049
1050		+ `fileneme`: the path of the file to read
1051		+ `sep`: csv separator delimiting the fields
1052		+ `session`: set `Session` field to this string for all analyses
1053		'''
1054		with open(filename) as fid:
1055			self.input(fid.read(), sep = sep, session = session)
1056
1057
1058	def input(self, txt, sep = '', session = ''):
1059		'''
1060		Read `txt` string in csv format to load analysis data into a `D47data` object.
1061
1062		In the csv string, spaces before and after field separators (`','` by default)
1063		are optional. Each line corresponds to a single analysis.
1064
1065		The required fields are:
1066
1067		+ `UID`: a unique identifier
1068		+ `Session`: an identifier for the analytical session
1069		+ `Sample`: a sample identifier
1070		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1071
1072		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1073		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1074		and `d49` are optional, and set to NaN by default.
1075
1076		**Parameters**
1077
1078		+ `txt`: the csv string to read
1079		+ `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`,
1080		whichever appers most often in `txt`.
1081		+ `session`: set `Session` field to this string for all analyses
1082		'''
1083		if sep == '':
1084			sep = sorted(',;\t', key = lambda x: - txt.count(x))[0]
1085		txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()]
1086		data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]]
1087
1088		if session != '':
1089			for r in data:
1090				r['Session'] = session
1091
1092		self += data
1093		self.refresh()
1094
1095
1096	@make_verbal
1097	def wg(self, samples = None, a18_acid = None):
1098		'''
1099		Compute bulk composition of the working gas for each session based on
1100		the carbonate standards defined in both `self.Nominal_d13C_VPDB` and
1101		`self.Nominal_d18O_VPDB`.
1102		'''
1103
1104		self.msg('Computing WG composition:')
1105
1106		if a18_acid is None:
1107			a18_acid = self.ALPHA_18O_ACID_REACTION
1108		if samples is None:
1109			samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB]
1110
1111		assert a18_acid, f'Acid fractionation factor should not be zero.'
1112
1113		samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB]
1114		R45R46_standards = {}
1115		for sample in samples:
1116			d13C_vpdb = self.Nominal_d13C_VPDB[sample]
1117			d18O_vpdb = self.Nominal_d18O_VPDB[sample]
1118			R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000)
1119			R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17
1120			R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid
1121
1122			C12_s = 1 / (1 + R13_s)
1123			C13_s = R13_s / (1 + R13_s)
1124			C16_s = 1 / (1 + R17_s + R18_s)
1125			C17_s = R17_s / (1 + R17_s + R18_s)
1126			C18_s = R18_s / (1 + R17_s + R18_s)
1127
1128			C626_s = C12_s * C16_s ** 2
1129			C627_s = 2 * C12_s * C16_s * C17_s
1130			C628_s = 2 * C12_s * C16_s * C18_s
1131			C636_s = C13_s * C16_s ** 2
1132			C637_s = 2 * C13_s * C16_s * C17_s
1133			C727_s = C12_s * C17_s ** 2
1134
1135			R45_s = (C627_s + C636_s) / C626_s
1136			R46_s = (C628_s + C637_s + C727_s) / C626_s
1137			R45R46_standards[sample] = (R45_s, R46_s)
1138		
1139		for s in self.sessions:
1140			db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples]
1141			assert db, f'No sample from {samples} found in session "{s}".'
1142# 			dbsamples = sorted({r['Sample'] for r in db})
1143
1144			X = [r['d45'] for r in db]
1145			Y = [R45R46_standards[r['Sample']][0] for r in db]
1146			x1, x2 = np.min(X), np.max(X)
1147
1148			if x1 < x2:
1149				wgcoord = x1/(x1-x2)
1150			else:
1151				wgcoord = 999
1152
1153			if wgcoord < -.5 or wgcoord > 1.5:
1154				# unreasonable to extrapolate to d45 = 0
1155				R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1156			else :
1157				# d45 = 0 is reasonably well bracketed
1158				R45_wg = np.polyfit(X, Y, 1)[1]
1159
1160			X = [r['d46'] for r in db]
1161			Y = [R45R46_standards[r['Sample']][1] for r in db]
1162			x1, x2 = np.min(X), np.max(X)
1163
1164			if x1 < x2:
1165				wgcoord = x1/(x1-x2)
1166			else:
1167				wgcoord = 999
1168
1169			if wgcoord < -.5 or wgcoord > 1.5:
1170				# unreasonable to extrapolate to d46 = 0
1171				R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1172			else :
1173				# d46 = 0 is reasonably well bracketed
1174				R46_wg = np.polyfit(X, Y, 1)[1]
1175
1176			d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg)
1177
1178			self.msg(f'Session {s} WG:   δ13C_VPDB = {d13Cwg_VPDB:.3f}   δ18O_VSMOW = {d18Owg_VSMOW:.3f}')
1179
1180			self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB
1181			self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW
1182			for r in self.sessions[s]['data']:
1183				r['d13Cwg_VPDB'] = d13Cwg_VPDB
1184				r['d18Owg_VSMOW'] = d18Owg_VSMOW
1185
1186
1187	def compute_bulk_delta(self, R45, R46, D17O = 0):
1188		'''
1189		Compute δ13C_VPDB and δ18O_VSMOW,
1190		by solving the generalized form of equation (17) from
1191		[Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05),
1192		assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and
1193		solving the corresponding second-order Taylor polynomial.
1194		(Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014))
1195		'''
1196
1197		K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17
1198
1199		A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17)
1200		B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17
1201		C = 2 * self.R18_VSMOW
1202		D = -R46
1203
1204		aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2
1205		bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C
1206		cc = A + B + C + D
1207
1208		d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa)
1209
1210		R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW
1211		R17 = K * R18 ** self.LAMBDA_17
1212		R13 = R45 - 2 * R17
1213
1214		d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1)
1215
1216		return d13C_VPDB, d18O_VSMOW
1217
1218
1219	@make_verbal
1220	def crunch(self, verbose = ''):
1221		'''
1222		Compute bulk composition and raw clumped isotope anomalies for all analyses.
1223		'''
1224		for r in self:
1225			self.compute_bulk_and_clumping_deltas(r)
1226		self.standardize_d13C()
1227		self.standardize_d18O()
1228		self.msg(f"Crunched {len(self)} analyses.")
1229
1230
1231	def fill_in_missing_info(self, session = 'mySession'):
1232		'''
1233		Fill in optional fields with default values
1234		'''
1235		for i,r in enumerate(self):
1236			if 'D17O' not in r:
1237				r['D17O'] = 0.
1238			if 'UID' not in r:
1239				r['UID'] = f'{i+1}'
1240			if 'Session' not in r:
1241				r['Session'] = session
1242			for k in ['d47', 'd48', 'd49']:
1243				if k not in r:
1244					r[k] = np.nan
1245
1246
1247	def standardize_d13C(self):
1248		'''
1249		Perform δ13C standadization within each session `s` according to
1250		`self.sessions[s]['d13C_standardization_method']`, which is defined by default
1251		by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but
1252		may be redefined abitrarily at a later stage.
1253		'''
1254		for s in self.sessions:
1255			if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']:
1256				XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB]
1257				X,Y = zip(*XY)
1258				if self.sessions[s]['d13C_standardization_method'] == '1pt':
1259					offset = np.mean(Y) - np.mean(X)
1260					for r in self.sessions[s]['data']:
1261						r['d13C_VPDB'] += offset				
1262				elif self.sessions[s]['d13C_standardization_method'] == '2pt':
1263					a,b = np.polyfit(X,Y,1)
1264					for r in self.sessions[s]['data']:
1265						r['d13C_VPDB'] = a * r['d13C_VPDB'] + b
1266
1267	def standardize_d18O(self):
1268		'''
1269		Perform δ18O standadization within each session `s` according to
1270		`self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`,
1271		which is defined by default by `D47data.refresh_sessions()`as equal to
1272		`self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage.
1273		'''
1274		for s in self.sessions:
1275			if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']:
1276				XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB]
1277				X,Y = zip(*XY)
1278				Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y]
1279				if self.sessions[s]['d18O_standardization_method'] == '1pt':
1280					offset = np.mean(Y) - np.mean(X)
1281					for r in self.sessions[s]['data']:
1282						r['d18O_VSMOW'] += offset				
1283				elif self.sessions[s]['d18O_standardization_method'] == '2pt':
1284					a,b = np.polyfit(X,Y,1)
1285					for r in self.sessions[s]['data']:
1286						r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b
1287	
1288
1289	def compute_bulk_and_clumping_deltas(self, r):
1290		'''
1291		Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`.
1292		'''
1293
1294		# Compute working gas R13, R18, and isobar ratios
1295		R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000)
1296		R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000)
1297		R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg)
1298
1299		# Compute analyte isobar ratios
1300		R45 = (1 + r['d45'] / 1000) * R45_wg
1301		R46 = (1 + r['d46'] / 1000) * R46_wg
1302		R47 = (1 + r['d47'] / 1000) * R47_wg
1303		R48 = (1 + r['d48'] / 1000) * R48_wg
1304		R49 = (1 + r['d49'] / 1000) * R49_wg
1305
1306		r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O'])
1307		R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB
1308		R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW
1309
1310		# Compute stochastic isobar ratios of the analyte
1311		R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios(
1312			R13, R18, D17O = r['D17O']
1313		)
1314
1315		# Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1,
1316		# and raise a warning if the corresponding anomalies exceed 0.02 ppm.
1317		if (R45 / R45stoch - 1) > 5e-8:
1318			self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm')
1319		if (R46 / R46stoch - 1) > 5e-8:
1320			self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm')
1321
1322		# Compute raw clumped isotope anomalies
1323		r['D47raw'] = 1000 * (R47 / R47stoch - 1)
1324		r['D48raw'] = 1000 * (R48 / R48stoch - 1)
1325		r['D49raw'] = 1000 * (R49 / R49stoch - 1)
1326
1327
1328	def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0):
1329		'''
1330		Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`,
1331		optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope
1332		anomalies (`D47`, `D48`, `D49`), all expressed in permil.
1333		'''
1334
1335		# Compute R17
1336		R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17
1337
1338		# Compute isotope concentrations
1339		C12 = (1 + R13) ** -1
1340		C13 = C12 * R13
1341		C16 = (1 + R17 + R18) ** -1
1342		C17 = C16 * R17
1343		C18 = C16 * R18
1344
1345		# Compute stochastic isotopologue concentrations
1346		C626 = C16 * C12 * C16
1347		C627 = C16 * C12 * C17 * 2
1348		C628 = C16 * C12 * C18 * 2
1349		C636 = C16 * C13 * C16
1350		C637 = C16 * C13 * C17 * 2
1351		C638 = C16 * C13 * C18 * 2
1352		C727 = C17 * C12 * C17
1353		C728 = C17 * C12 * C18 * 2
1354		C737 = C17 * C13 * C17
1355		C738 = C17 * C13 * C18 * 2
1356		C828 = C18 * C12 * C18
1357		C838 = C18 * C13 * C18
1358
1359		# Compute stochastic isobar ratios
1360		R45 = (C636 + C627) / C626
1361		R46 = (C628 + C637 + C727) / C626
1362		R47 = (C638 + C728 + C737) / C626
1363		R48 = (C738 + C828) / C626
1364		R49 = C838 / C626
1365
1366		# Account for stochastic anomalies
1367		R47 *= 1 + D47 / 1000
1368		R48 *= 1 + D48 / 1000
1369		R49 *= 1 + D49 / 1000
1370
1371		# Return isobar ratios
1372		return R45, R46, R47, R48, R49
1373
1374
1375	def split_samples(self, samples_to_split = 'all', grouping = 'by_session'):
1376		'''
1377		Split unknown samples by UID (treat all analyses as different samples)
1378		or by session (treat analyses of a given sample in different sessions as
1379		different samples).
1380
1381		**Parameters**
1382
1383		+ `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']`
1384		+ `grouping`: `by_uid` | `by_session`
1385		'''
1386		if samples_to_split == 'all':
1387			samples_to_split = [s for s in self.unknowns]
1388		gkeys = {'by_uid':'UID', 'by_session':'Session'}
1389		self.grouping = grouping.lower()
1390		if self.grouping in gkeys:
1391			gkey = gkeys[self.grouping]
1392		for r in self:
1393			if r['Sample'] in samples_to_split:
1394				r['Sample_original'] = r['Sample']
1395				r['Sample'] = f"{r['Sample']}__{r[gkey]}"
1396			elif r['Sample'] in self.unknowns:
1397				r['Sample_original'] = r['Sample']
1398		self.refresh_samples()
1399
1400
1401	def unsplit_samples(self, tables = False):
1402		'''
1403		Reverse the effects of `D47data.split_samples()`.
1404		
1405		This should only be used after `D4xdata.standardize()` with `method='pooled'`.
1406		
1407		After `D4xdata.standardize()` with `method='indep_sessions'`, one should
1408		probably use `D4xdata.combine_samples()` instead to reverse the effects of
1409		`D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the
1410		effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in
1411		that case session-averaged Δ4x values are statistically independent).
1412		'''
1413		unknowns_old = sorted({s for s in self.unknowns})
1414		CM_old = self.standardization.covar[:,:]
1415		VD_old = self.standardization.params.valuesdict().copy()
1416		vars_old = self.standardization.var_names
1417
1418		unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r})
1419
1420		Ns = len(vars_old) - len(unknowns_old)
1421		vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new]
1422		VD_new = {k: VD_old[k] for k in vars_old[:Ns]}
1423
1424		W = np.zeros((len(vars_new), len(vars_old)))
1425		W[:Ns,:Ns] = np.eye(Ns)
1426		for u in unknowns_new:
1427			splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u})
1428			if self.grouping == 'by_session':
1429				weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits]
1430			elif self.grouping == 'by_uid':
1431				weights = [1 for s in splits]
1432			sw = sum(weights)
1433			weights = [w/sw for w in weights]
1434			W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:]
1435
1436		CM_new = W @ CM_old @ W.T
1437		V = W @ np.array([[VD_old[k]] for k in vars_old])
1438		VD_new = {k:v[0] for k,v in zip(vars_new, V)}
1439
1440		self.standardization.covar = CM_new
1441		self.standardization.params.valuesdict = lambda : VD_new
1442		self.standardization.var_names = vars_new
1443
1444		for r in self:
1445			if r['Sample'] in self.unknowns:
1446				r['Sample_split'] = r['Sample']
1447				r['Sample'] = r['Sample_original']
1448
1449		self.refresh_samples()
1450		self.consolidate_samples()
1451		self.repeatabilities()
1452
1453		if tables:
1454			self.table_of_analyses()
1455			self.table_of_samples()
1456
1457	def assign_timestamps(self):
1458		'''
1459		Assign a time field `t` of type `float` to each analysis.
1460
1461		If `TimeTag` is one of the data fields, `t` is equal within a given session
1462		to `TimeTag` minus the mean value of `TimeTag` for that session.
1463		Otherwise, `TimeTag` is by default equal to the index of each analysis
1464		in the dataset and `t` is defined as above.
1465		'''
1466		for session in self.sessions:
1467			sdata = self.sessions[session]['data']
1468			try:
1469				t0 = np.mean([r['TimeTag'] for r in sdata])
1470				for r in sdata:
1471					r['t'] = r['TimeTag'] - t0
1472			except KeyError:
1473				t0 = (len(sdata)-1)/2
1474				for t,r in enumerate(sdata):
1475					r['t'] = t - t0
1476
1477
1478	def report(self):
1479		'''
1480		Prints a report on the standardization fit.
1481		Only applicable after `D4xdata.standardize(method='pooled')`.
1482		'''
1483		report_fit(self.standardization)
1484
1485
1486	def combine_samples(self, sample_groups):
1487		'''
1488		Combine analyses of different samples to compute weighted average Δ4x
1489		and new error (co)variances corresponding to the groups defined by the `sample_groups`
1490		dictionary.
1491		
1492		Caution: samples are weighted by number of replicate analyses, which is a
1493		reasonable default behavior but is not always optimal (e.g., in the case of strongly
1494		correlated analytical errors for one or more samples).
1495		
1496		Returns a tuplet of:
1497		
1498		+ the list of group names
1499		+ an array of the corresponding Δ4x values
1500		+ the corresponding (co)variance matrix
1501		
1502		**Parameters**
1503
1504		+ `sample_groups`: a dictionary of the form:
1505		```py
1506		{'group1': ['sample_1', 'sample_2'],
1507		 'group2': ['sample_3', 'sample_4', 'sample_5']}
1508		```
1509		'''
1510		
1511		samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])]
1512		groups = sorted(sample_groups.keys())
1513		group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups}
1514		D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples])
1515		CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples])
1516		W = np.array([
1517			[self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples]
1518			for j in groups])
1519		D4x_new = W @ D4x_old
1520		CM_new = W @ CM_old @ W.T
1521
1522		return groups, D4x_new[:,0], CM_new
1523		
1524
1525	@make_verbal
1526	def standardize(self,
1527		method = 'pooled',
1528		weighted_sessions = [],
1529		consolidate = True,
1530		consolidate_tables = False,
1531		consolidate_plots = False,
1532		constraints = {},
1533		):
1534		'''
1535		Compute absolute Δ4x values for all replicate analyses and for sample averages.
1536		If `method` argument is set to `'pooled'`, the standardization processes all sessions
1537		in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous,
1538		i.e. that their true Δ4x value does not change between sessions,
1539		([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to
1540		`'indep_sessions'`, the standardization processes each session independently, based only
1541		on anchors analyses.
1542		'''
1543
1544		self.standardization_method = method
1545		self.assign_timestamps()
1546
1547		if method == 'pooled':
1548			if weighted_sessions:
1549				for session_group in weighted_sessions:
1550					if self._4x == '47':
1551						X = D47data([r for r in self if r['Session'] in session_group])
1552					elif self._4x == '48':
1553						X = D48data([r for r in self if r['Session'] in session_group])
1554					X.Nominal_D4x = self.Nominal_D4x.copy()
1555					X.refresh()
1556					result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False)
1557					w = np.sqrt(result.redchi)
1558					self.msg(f'Session group {session_group} MRSWD = {w:.4f}')
1559					for r in X:
1560						r[f'wD{self._4x}raw'] *= w
1561			else:
1562				self.msg(f'All D{self._4x}raw weights set to 1 ‰')
1563				for r in self:
1564					r[f'wD{self._4x}raw'] = 1.
1565
1566			params = Parameters()
1567			for k,session in enumerate(self.sessions):
1568				self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.")
1569				self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.")
1570				self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.")
1571				s = pf(session)
1572				params.add(f'a_{s}', value = 0.9)
1573				params.add(f'b_{s}', value = 0.)
1574				params.add(f'c_{s}', value = -0.9)
1575				params.add(f'a2_{s}', value = 0.,
1576# 					vary = self.sessions[session]['scrambling_drift'],
1577					)
1578				params.add(f'b2_{s}', value = 0.,
1579# 					vary = self.sessions[session]['slope_drift'],
1580					)
1581				params.add(f'c2_{s}', value = 0.,
1582# 					vary = self.sessions[session]['wg_drift'],
1583					)
1584				if not self.sessions[session]['scrambling_drift']:
1585					params[f'a2_{s}'].expr = '0'
1586				if not self.sessions[session]['slope_drift']:
1587					params[f'b2_{s}'].expr = '0'
1588				if not self.sessions[session]['wg_drift']:
1589					params[f'c2_{s}'].expr = '0'
1590
1591			for sample in self.unknowns:
1592				params.add(f'D{self._4x}_{pf(sample)}', value = 0.5)
1593
1594			for k in constraints:
1595				params[k].expr = constraints[k]
1596
1597			def residuals(p):
1598				R = []
1599				for r in self:
1600					session = pf(r['Session'])
1601					sample = pf(r['Sample'])
1602					if r['Sample'] in self.Nominal_D4x:
1603						R += [ (
1604							r[f'D{self._4x}raw'] - (
1605								p[f'a_{session}'] * self.Nominal_D4x[r['Sample']]
1606								+ p[f'b_{session}'] * r[f'd{self._4x}']
1607								+	p[f'c_{session}']
1608								+ r['t'] * (
1609									p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']]
1610									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1611									+	p[f'c2_{session}']
1612									)
1613								)
1614							) / r[f'wD{self._4x}raw'] ]
1615					else:
1616						R += [ (
1617							r[f'D{self._4x}raw'] - (
1618								p[f'a_{session}'] * p[f'D{self._4x}_{sample}']
1619								+ p[f'b_{session}'] * r[f'd{self._4x}']
1620								+	p[f'c_{session}']
1621								+ r['t'] * (
1622									p[f'a2_{session}'] * p[f'D{self._4x}_{sample}']
1623									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1624									+	p[f'c2_{session}']
1625									)
1626								)
1627							) / r[f'wD{self._4x}raw'] ]
1628				return R
1629
1630			M = Minimizer(residuals, params)
1631			result = M.least_squares()
1632			self.Nf = result.nfree
1633			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1634			new_names, new_covar, new_se = _fullcovar(result)[:3]
1635			result.var_names = new_names
1636			result.covar = new_covar
1637
1638			for r in self:
1639				s = pf(r["Session"])
1640				a = result.params.valuesdict()[f'a_{s}']
1641				b = result.params.valuesdict()[f'b_{s}']
1642				c = result.params.valuesdict()[f'c_{s}']
1643				a2 = result.params.valuesdict()[f'a2_{s}']
1644				b2 = result.params.valuesdict()[f'b2_{s}']
1645				c2 = result.params.valuesdict()[f'c2_{s}']
1646				r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1647				
1648
1649			self.standardization = result
1650
1651			for session in self.sessions:
1652				self.sessions[session]['Np'] = 3
1653				for k in ['scrambling', 'slope', 'wg']:
1654					if self.sessions[session][f'{k}_drift']:
1655						self.sessions[session]['Np'] += 1
1656
1657			if consolidate:
1658				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
1659			return result
1660
1661
1662		elif method == 'indep_sessions':
1663
1664			if weighted_sessions:
1665				for session_group in weighted_sessions:
1666					X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x)
1667					X.Nominal_D4x = self.Nominal_D4x.copy()
1668					X.refresh()
1669					# This is only done to assign r['wD47raw'] for r in X:
1670					X.standardize(method = method, weighted_sessions = [], consolidate = False)
1671					self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}')
1672			else:
1673				self.msg('All weights set to 1 ‰')
1674				for r in self:
1675					r[f'wD{self._4x}raw'] = 1
1676
1677			for session in self.sessions:
1678				s = self.sessions[session]
1679				p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2']
1680				p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']]
1681				s['Np'] = sum(p_active)
1682				sdata = s['data']
1683
1684				A = np.array([
1685					[
1686						self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'],
1687						r[f'd{self._4x}'] / r[f'wD{self._4x}raw'],
1688						1 / r[f'wD{self._4x}raw'],
1689						self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'],
1690						r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'],
1691						r['t'] / r[f'wD{self._4x}raw']
1692						]
1693					for r in sdata if r['Sample'] in self.anchors
1694					])[:,p_active] # only keep columns for the active parameters
1695				Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors])
1696				s['Na'] = Y.size
1697				CM = linalg.inv(A.T @ A)
1698				bf = (CM @ A.T @ Y).T[0,:]
1699				k = 0
1700				for n,a in zip(p_names, p_active):
1701					if a:
1702						s[n] = bf[k]
1703# 						self.msg(f'{n} = {bf[k]}')
1704						k += 1
1705					else:
1706						s[n] = 0.
1707# 						self.msg(f'{n} = 0.0')
1708
1709				for r in sdata :
1710					a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2']
1711					r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1712					r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t'])
1713
1714				s['CM'] = np.zeros((6,6))
1715				i = 0
1716				k_active = [j for j,a in enumerate(p_active) if a]
1717				for j,a in enumerate(p_active):
1718					if a:
1719						s['CM'][j,k_active] = CM[i,:]
1720						i += 1
1721
1722			if not weighted_sessions:
1723				w = self.rmswd()['rmswd']
1724				for r in self:
1725						r[f'wD{self._4x}'] *= w
1726						r[f'wD{self._4x}raw'] *= w
1727				for session in self.sessions:
1728					self.sessions[session]['CM'] *= w**2
1729
1730			for session in self.sessions:
1731				s = self.sessions[session]
1732				s['SE_a'] = s['CM'][0,0]**.5
1733				s['SE_b'] = s['CM'][1,1]**.5
1734				s['SE_c'] = s['CM'][2,2]**.5
1735				s['SE_a2'] = s['CM'][3,3]**.5
1736				s['SE_b2'] = s['CM'][4,4]**.5
1737				s['SE_c2'] = s['CM'][5,5]**.5
1738
1739			if not weighted_sessions:
1740				self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions])
1741			else:
1742				self.Nf = 0
1743				for sg in weighted_sessions:
1744					self.Nf += self.rmswd(sessions = sg)['Nf']
1745
1746			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1747
1748			avgD4x = {
1749				sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample])
1750				for sample in self.samples
1751				}
1752			chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self])
1753			rD4x = (chi2/self.Nf)**.5
1754			self.repeatability[f'sigma_{self._4x}'] = rD4x
1755
1756			if consolidate:
1757				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
1758
1759
1760	def standardization_error(self, session, d4x, D4x, t = 0):
1761		'''
1762		Compute standardization error for a given session and
1763		(δ47, Δ47) composition.
1764		'''
1765		a = self.sessions[session]['a']
1766		b = self.sessions[session]['b']
1767		c = self.sessions[session]['c']
1768		a2 = self.sessions[session]['a2']
1769		b2 = self.sessions[session]['b2']
1770		c2 = self.sessions[session]['c2']
1771		CM = self.sessions[session]['CM']
1772
1773		x, y = D4x, d4x
1774		z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t
1775# 		x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t)
1776		dxdy = -(b+b2*t) / (a+a2*t)
1777		dxdz = 1. / (a+a2*t)
1778		dxda = -x / (a+a2*t)
1779		dxdb = -y / (a+a2*t)
1780		dxdc = -1. / (a+a2*t)
1781		dxda2 = -x * a2 / (a+a2*t)
1782		dxdb2 = -y * t / (a+a2*t)
1783		dxdc2 = -t / (a+a2*t)
1784		V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2])
1785		sx = (V @ CM @ V.T) ** .5
1786		return sx
1787
1788
1789	@make_verbal
1790	def summary(self,
1791		dir = 'output',
1792		filename = None,
1793		save_to_file = True,
1794		print_out = True,
1795		):
1796		'''
1797		Print out an/or save to disk a summary of the standardization results.
1798
1799		**Parameters**
1800
1801		+ `dir`: the directory in which to save the table
1802		+ `filename`: the name to the csv file to write to
1803		+ `save_to_file`: whether to save the table to disk
1804		+ `print_out`: whether to print out the table
1805		'''
1806
1807		out = []
1808		out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]]
1809		out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]]
1810		out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]]
1811		out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]]
1812		out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]]
1813		out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]]
1814		out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]]
1815		out += [['Model degrees of freedom', f"{self.Nf}"]]
1816		out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]]
1817		out += [['Standardization method', self.standardization_method]]
1818
1819		if save_to_file:
1820			if not os.path.exists(dir):
1821				os.makedirs(dir)
1822			if filename is None:
1823				filename = f'D{self._4x}_summary.csv'
1824			with open(f'{dir}/{filename}', 'w') as fid:
1825				fid.write(make_csv(out))
1826		if print_out:
1827			self.msg('\n' + pretty_table(out, header = 0))
1828
1829
1830	@make_verbal
1831	def table_of_sessions(self,
1832		dir = 'output',
1833		filename = None,
1834		save_to_file = True,
1835		print_out = True,
1836		output = None,
1837		):
1838		'''
1839		Print out an/or save to disk a table of sessions.
1840
1841		**Parameters**
1842
1843		+ `dir`: the directory in which to save the table
1844		+ `filename`: the name to the csv file to write to
1845		+ `save_to_file`: whether to save the table to disk
1846		+ `print_out`: whether to print out the table
1847		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1848		    if set to `'raw'`: return a list of list of strings
1849		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1850		'''
1851		include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions])
1852		include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions])
1853		include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions])
1854
1855		out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']]
1856		if include_a2:
1857			out[-1] += ['a2 ± SE']
1858		if include_b2:
1859			out[-1] += ['b2 ± SE']
1860		if include_c2:
1861			out[-1] += ['c2 ± SE']
1862		for session in self.sessions:
1863			out += [[
1864				session,
1865				f"{self.sessions[session]['Na']}",
1866				f"{self.sessions[session]['Nu']}",
1867				f"{self.sessions[session]['d13Cwg_VPDB']:.3f}",
1868				f"{self.sessions[session]['d18Owg_VSMOW']:.3f}",
1869				f"{self.sessions[session]['r_d13C_VPDB']:.4f}",
1870				f"{self.sessions[session]['r_d18O_VSMOW']:.4f}",
1871				f"{self.sessions[session][f'r_D{self._4x}']:.4f}",
1872				f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}",
1873				f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}",
1874				f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}",
1875				]]
1876			if include_a2:
1877				if self.sessions[session]['scrambling_drift']:
1878					out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"]
1879				else:
1880					out[-1] += ['']
1881			if include_b2:
1882				if self.sessions[session]['slope_drift']:
1883					out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"]
1884				else:
1885					out[-1] += ['']
1886			if include_c2:
1887				if self.sessions[session]['wg_drift']:
1888					out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"]
1889				else:
1890					out[-1] += ['']
1891
1892		if save_to_file:
1893			if not os.path.exists(dir):
1894				os.makedirs(dir)
1895			if filename is None:
1896				filename = f'D{self._4x}_sessions.csv'
1897			with open(f'{dir}/{filename}', 'w') as fid:
1898				fid.write(make_csv(out))
1899		if print_out:
1900			self.msg('\n' + pretty_table(out))
1901		if output == 'raw':
1902			return out
1903		elif output == 'pretty':
1904			return pretty_table(out)
1905
1906
1907	@make_verbal
1908	def table_of_analyses(
1909		self,
1910		dir = 'output',
1911		filename = None,
1912		save_to_file = True,
1913		print_out = True,
1914		output = None,
1915		):
1916		'''
1917		Print out an/or save to disk a table of analyses.
1918
1919		**Parameters**
1920
1921		+ `dir`: the directory in which to save the table
1922		+ `filename`: the name to the csv file to write to
1923		+ `save_to_file`: whether to save the table to disk
1924		+ `print_out`: whether to print out the table
1925		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1926		    if set to `'raw'`: return a list of list of strings
1927		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1928		'''
1929
1930		out = [['UID','Session','Sample']]
1931		extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}]
1932		for f in extra_fields:
1933			out[-1] += [f[0]]
1934		out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}']
1935		for r in self:
1936			out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]]
1937			for f in extra_fields:
1938				out[-1] += [f"{r[f[0]]:{f[1]}}"]
1939			out[-1] += [
1940				f"{r['d13Cwg_VPDB']:.3f}",
1941				f"{r['d18Owg_VSMOW']:.3f}",
1942				f"{r['d45']:.6f}",
1943				f"{r['d46']:.6f}",
1944				f"{r['d47']:.6f}",
1945				f"{r['d48']:.6f}",
1946				f"{r['d49']:.6f}",
1947				f"{r['d13C_VPDB']:.6f}",
1948				f"{r['d18O_VSMOW']:.6f}",
1949				f"{r['D47raw']:.6f}",
1950				f"{r['D48raw']:.6f}",
1951				f"{r['D49raw']:.6f}",
1952				f"{r[f'D{self._4x}']:.6f}"
1953				]
1954		if save_to_file:
1955			if not os.path.exists(dir):
1956				os.makedirs(dir)
1957			if filename is None:
1958				filename = f'D{self._4x}_analyses.csv'
1959			with open(f'{dir}/{filename}', 'w') as fid:
1960				fid.write(make_csv(out))
1961		if print_out:
1962			self.msg('\n' + pretty_table(out))
1963		return out
1964
1965	@make_verbal
1966	def covar_table(
1967		self,
1968		correl = False,
1969		dir = 'output',
1970		filename = None,
1971		save_to_file = True,
1972		print_out = True,
1973		output = None,
1974		):
1975		'''
1976		Print out, save to disk and/or return the variance-covariance matrix of D4x
1977		for all unknown samples.
1978
1979		**Parameters**
1980
1981		+ `dir`: the directory in which to save the csv
1982		+ `filename`: the name of the csv file to write to
1983		+ `save_to_file`: whether to save the csv
1984		+ `print_out`: whether to print out the matrix
1985		+ `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`);
1986		    if set to `'raw'`: return a list of list of strings
1987		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1988		'''
1989		samples = sorted([u for u in self.unknowns])
1990		out = [[''] + samples]
1991		for s1 in samples:
1992			out.append([s1])
1993			for s2 in samples:
1994				if correl:
1995					out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}')
1996				else:
1997					out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}')
1998
1999		if save_to_file:
2000			if not os.path.exists(dir):
2001				os.makedirs(dir)
2002			if filename is None:
2003				if correl:
2004					filename = f'D{self._4x}_correl.csv'
2005				else:
2006					filename = f'D{self._4x}_covar.csv'
2007			with open(f'{dir}/{filename}', 'w') as fid:
2008				fid.write(make_csv(out))
2009		if print_out:
2010			self.msg('\n'+pretty_table(out))
2011		if output == 'raw':
2012			return out
2013		elif output == 'pretty':
2014			return pretty_table(out)
2015
2016	@make_verbal
2017	def table_of_samples(
2018		self,
2019		dir = 'output',
2020		filename = None,
2021		save_to_file = True,
2022		print_out = True,
2023		output = None,
2024		):
2025		'''
2026		Print out, save to disk and/or return a table of samples.
2027
2028		**Parameters**
2029
2030		+ `dir`: the directory in which to save the csv
2031		+ `filename`: the name of the csv file to write to
2032		+ `save_to_file`: whether to save the csv
2033		+ `print_out`: whether to print out the table
2034		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
2035		    if set to `'raw'`: return a list of list of strings
2036		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
2037		'''
2038
2039		out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']]
2040		for sample in self.anchors:
2041			out += [[
2042				f"{sample}",
2043				f"{self.samples[sample]['N']}",
2044				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2045				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2046				f"{self.samples[sample][f'D{self._4x}']:.4f}",'','',
2047				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', ''
2048				]]
2049		for sample in self.unknowns:
2050			out += [[
2051				f"{sample}",
2052				f"{self.samples[sample]['N']}",
2053				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2054				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2055				f"{self.samples[sample][f'D{self._4x}']:.4f}",
2056				f"{self.samples[sample][f'SE_D{self._4x}']:.4f}",
2057				f{self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}",
2058				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '',
2059				f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else ''
2060				]]
2061		if save_to_file:
2062			if not os.path.exists(dir):
2063				os.makedirs(dir)
2064			if filename is None:
2065				filename = f'D{self._4x}_samples.csv'
2066			with open(f'{dir}/{filename}', 'w') as fid:
2067				fid.write(make_csv(out))
2068		if print_out:
2069			self.msg('\n'+pretty_table(out))
2070		if output == 'raw':
2071			return out
2072		elif output == 'pretty':
2073			return pretty_table(out)
2074
2075
2076	def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100):
2077		'''
2078		Generate session plots and save them to disk.
2079
2080		**Parameters**
2081
2082		+ `dir`: the directory in which to save the plots
2083		+ `figsize`: the width and height (in inches) of each plot
2084		+ `filetype`: 'pdf' or 'png'
2085		+ `dpi`: resolution for PNG output
2086		'''
2087		if not os.path.exists(dir):
2088			os.makedirs(dir)
2089
2090		for session in self.sessions:
2091			sp = self.plot_single_session(session, xylimits = 'constant')
2092			ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {}))
2093			ppl.close(sp.fig)
2094
2095
2096	@make_verbal
2097	def consolidate_samples(self):
2098		'''
2099		Compile various statistics for each sample.
2100
2101		For each anchor sample:
2102
2103		+ `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x`
2104		+ `SE_D47` or `SE_D48`: set to zero by definition
2105
2106		For each unknown sample:
2107
2108		+ `D47` or `D48`: the standardized Δ4x value for this unknown
2109		+ `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown
2110
2111		For each anchor and unknown:
2112
2113		+ `N`: the total number of analyses of this sample
2114		+ `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample
2115		+ `d13C_VPDB`: the average δ13C_VPDB value for this sample
2116		+ `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2)
2117		+ `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal
2118		variance, indicating whether the Δ4x repeatability this sample differs significantly from
2119		that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`.
2120		'''
2121		D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']]
2122		for sample in self.samples:
2123			self.samples[sample]['N'] = len(self.samples[sample]['data'])
2124			if self.samples[sample]['N'] > 1:
2125				self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']])
2126
2127			self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']])
2128			self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']])
2129
2130			D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']]
2131			if len(D4x_pop) > 2:
2132				self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1]
2133			
2134		if self.standardization_method == 'pooled':
2135			for sample in self.anchors:
2136				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2137				self.samples[sample][f'SE_D{self._4x}'] = 0.
2138			for sample in self.unknowns:
2139				self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}']
2140				try:
2141					self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5
2142				except ValueError:
2143					# when `sample` is constrained by self.standardize(constraints = {...}),
2144					# it is no longer listed in self.standardization.var_names.
2145					# Temporary fix: define SE as zero for now
2146					self.samples[sample][f'SE_D4{self._4x}'] = 0.
2147
2148		elif self.standardization_method == 'indep_sessions':
2149			for sample in self.anchors:
2150				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2151				self.samples[sample][f'SE_D{self._4x}'] = 0.
2152			for sample in self.unknowns:
2153				self.msg(f'Consolidating sample {sample}')
2154				self.unknowns[sample][f'session_D{self._4x}'] = {}
2155				session_avg = []
2156				for session in self.sessions:
2157					sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample]
2158					if sdata:
2159						self.msg(f'{sample} found in session {session}')
2160						avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata])
2161						avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata])
2162						# !! TODO: sigma_s below does not account for temporal changes in standardization error
2163						sigma_s = self.standardization_error(session, avg_d4x, avg_D4x)
2164						sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5
2165						session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5])
2166						self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1]
2167				self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg))
2168				weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']}
2169				wsum = sum([weights[s] for s in weights])
2170				for s in weights:
2171					self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum]
2172
2173		for r in self:
2174			r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}']
2175
2176
2177
2178	def consolidate_sessions(self):
2179		'''
2180		Compute various statistics for each session.
2181
2182		+ `Na`: Number of anchor analyses in the session
2183		+ `Nu`: Number of unknown analyses in the session
2184		+ `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session
2185		+ `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session
2186		+ `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session
2187		+ `a`: scrambling factor
2188		+ `b`: compositional slope
2189		+ `c`: WG offset
2190		+ `SE_a`: Model stadard erorr of `a`
2191		+ `SE_b`: Model stadard erorr of `b`
2192		+ `SE_c`: Model stadard erorr of `c`
2193		+ `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`)
2194		+ `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`)
2195		+ `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`)
2196		+ `a2`: scrambling factor drift
2197		+ `b2`: compositional slope drift
2198		+ `c2`: WG offset drift
2199		+ `Np`: Number of standardization parameters to fit
2200		+ `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`)
2201		+ `d13Cwg_VPDB`: δ13C_VPDB of WG
2202		+ `d18Owg_VSMOW`: δ18O_VSMOW of WG
2203		'''
2204		for session in self.sessions:
2205			if 'd13Cwg_VPDB' not in self.sessions[session]:
2206				self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB']
2207			if 'd18Owg_VSMOW' not in self.sessions[session]:
2208				self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW']
2209			self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors])
2210			self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns])
2211
2212			self.msg(f'Computing repeatabilities for session {session}')
2213			self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session])
2214			self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session])
2215			self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session])
2216
2217		if self.standardization_method == 'pooled':
2218			for session in self.sessions:
2219
2220				self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}']
2221				i = self.standardization.var_names.index(f'a_{pf(session)}')
2222				self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5
2223
2224				self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}']
2225				i = self.standardization.var_names.index(f'b_{pf(session)}')
2226				self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5
2227
2228				self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}']
2229				i = self.standardization.var_names.index(f'c_{pf(session)}')
2230				self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5
2231
2232				self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}']
2233				if self.sessions[session]['scrambling_drift']:
2234					i = self.standardization.var_names.index(f'a2_{pf(session)}')
2235					self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5
2236				else:
2237					self.sessions[session]['SE_a2'] = 0.
2238
2239				self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}']
2240				if self.sessions[session]['slope_drift']:
2241					i = self.standardization.var_names.index(f'b2_{pf(session)}')
2242					self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5
2243				else:
2244					self.sessions[session]['SE_b2'] = 0.
2245
2246				self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}']
2247				if self.sessions[session]['wg_drift']:
2248					i = self.standardization.var_names.index(f'c2_{pf(session)}')
2249					self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5
2250				else:
2251					self.sessions[session]['SE_c2'] = 0.
2252
2253				i = self.standardization.var_names.index(f'a_{pf(session)}')
2254				j = self.standardization.var_names.index(f'b_{pf(session)}')
2255				k = self.standardization.var_names.index(f'c_{pf(session)}')
2256				CM = np.zeros((6,6))
2257				CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]]
2258				try:
2259					i2 = self.standardization.var_names.index(f'a2_{pf(session)}')
2260					CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]]
2261					CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2]
2262					try:
2263						j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2264						CM[3,4] = self.standardization.covar[i2,j2]
2265						CM[4,3] = self.standardization.covar[j2,i2]
2266					except ValueError:
2267						pass
2268					try:
2269						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2270						CM[3,5] = self.standardization.covar[i2,k2]
2271						CM[5,3] = self.standardization.covar[k2,i2]
2272					except ValueError:
2273						pass
2274				except ValueError:
2275					pass
2276				try:
2277					j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2278					CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]]
2279					CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2]
2280					try:
2281						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2282						CM[4,5] = self.standardization.covar[j2,k2]
2283						CM[5,4] = self.standardization.covar[k2,j2]
2284					except ValueError:
2285						pass
2286				except ValueError:
2287					pass
2288				try:
2289					k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2290					CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]]
2291					CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2]
2292				except ValueError:
2293					pass
2294
2295				self.sessions[session]['CM'] = CM
2296
2297		elif self.standardization_method == 'indep_sessions':
2298			pass # Not implemented yet
2299
2300
2301	@make_verbal
2302	def repeatabilities(self):
2303		'''
2304		Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x
2305		(for all samples, for anchors, and for unknowns).
2306		'''
2307		self.msg('Computing reproducibilities for all sessions')
2308
2309		self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors')
2310		self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors')
2311		self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors')
2312		self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns')
2313		self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples')
2314
2315
2316	@make_verbal
2317	def consolidate(self, tables = True, plots = True):
2318		'''
2319		Collect information about samples, sessions and repeatabilities.
2320		'''
2321		self.consolidate_samples()
2322		self.consolidate_sessions()
2323		self.repeatabilities()
2324
2325		if tables:
2326			self.summary()
2327			self.table_of_sessions()
2328			self.table_of_analyses()
2329			self.table_of_samples()
2330
2331		if plots:
2332			self.plot_sessions()
2333
2334
2335	@make_verbal
2336	def rmswd(self,
2337		samples = 'all samples',
2338		sessions = 'all sessions',
2339		):
2340		'''
2341		Compute the χ2, root mean squared weighted deviation
2342		(i.e. reduced χ2), and corresponding degrees of freedom of the
2343		Δ4x values for samples in `samples` and sessions in `sessions`.
2344		
2345		Only used in `D4xdata.standardize()` with `method='indep_sessions'`.
2346		'''
2347		if samples == 'all samples':
2348			mysamples = [k for k in self.samples]
2349		elif samples == 'anchors':
2350			mysamples = [k for k in self.anchors]
2351		elif samples == 'unknowns':
2352			mysamples = [k for k in self.unknowns]
2353		else:
2354			mysamples = samples
2355
2356		if sessions == 'all sessions':
2357			sessions = [k for k in self.sessions]
2358
2359		chisq, Nf = 0, 0
2360		for sample in mysamples :
2361			G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2362			if len(G) > 1 :
2363				X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G])
2364				Nf += (len(G) - 1)
2365				chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G])
2366		r = (chisq / Nf)**.5 if Nf > 0 else 0
2367		self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.')
2368		return {'rmswd': r, 'chisq': chisq, 'Nf': Nf}
2369
2370	
2371	@make_verbal
2372	def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'):
2373		'''
2374		Compute the repeatability of `[r[key] for r in self]`
2375		'''
2376
2377		if samples == 'all samples':
2378			mysamples = [k for k in self.samples]
2379		elif samples == 'anchors':
2380			mysamples = [k for k in self.anchors]
2381		elif samples == 'unknowns':
2382			mysamples = [k for k in self.unknowns]
2383		else:
2384			mysamples = samples
2385
2386		if sessions == 'all sessions':
2387			sessions = [k for k in self.sessions]
2388
2389		if key in ['D47', 'D48']:
2390			# Full disclosure: the definition of Nf is tricky/debatable
2391			G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions]
2392			chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum()
2393			Nf = len(G)
2394# 			print(f'len(G) = {Nf}')
2395			Nf -= len([s for s in mysamples if s in self.unknowns])
2396# 			print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider')
2397			for session in sessions:
2398				Np = len([
2399					_ for _ in self.standardization.params
2400					if (
2401						self.standardization.params[_].expr is not None
2402						and (
2403							(_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session))
2404							or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session))
2405							)
2406						)
2407					])
2408# 				print(f'session {session}: {Np} parameters to consider')
2409				Na = len({
2410					r['Sample'] for r in self.sessions[session]['data']
2411					if r['Sample'] in self.anchors and r['Sample'] in mysamples
2412					})
2413# 				print(f'session {session}: {Na} different anchors in that session')
2414				Nf -= min(Np, Na)
2415# 			print(f'Nf = {Nf}')
2416
2417# 			for sample in mysamples :
2418# 				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2419# 				if len(X) > 1 :
2420# 					chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ])
2421# 					if sample in self.unknowns:
2422# 						Nf += len(X) - 1
2423# 					else:
2424# 						Nf += len(X)
2425# 			if samples in ['anchors', 'all samples']:
2426# 				Nf -= sum([self.sessions[s]['Np'] for s in sessions])
2427			r = (chisq / Nf)**.5 if Nf > 0 else 0
2428
2429		else: # if key not in ['D47', 'D48']
2430			chisq, Nf = 0, 0
2431			for sample in mysamples :
2432				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2433				if len(X) > 1 :
2434					Nf += len(X) - 1
2435					chisq += np.sum([ (x-np.mean(X))**2 for x in X ])
2436			r = (chisq / Nf)**.5 if Nf > 0 else 0
2437
2438		self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.')
2439		return r
2440
2441	def sample_average(self, samples, weights = 'equal', normalize = True):
2442		'''
2443		Weighted average Δ4x value of a group of samples, accounting for covariance.
2444
2445		Returns the weighed average Δ4x value and associated SE
2446		of a group of samples. Weights are equal by default. If `normalize` is
2447		true, `weights` will be rescaled so that their sum equals 1.
2448
2449		**Examples**
2450
2451		```python
2452		self.sample_average(['X','Y'], [1, 2])
2453		```
2454
2455		returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3,
2456		where Δ4x(X) and Δ4x(Y) are the average Δ4x
2457		values of samples X and Y, respectively.
2458
2459		```python
2460		self.sample_average(['X','Y'], [1, -1], normalize = False)
2461		```
2462
2463		returns the value and SE of the difference Δ4x(X) - Δ4x(Y).
2464		'''
2465		if weights == 'equal':
2466			weights = [1/len(samples)] * len(samples)
2467
2468		if normalize:
2469			s = sum(weights)
2470			if s:
2471				weights = [w/s for w in weights]
2472
2473		try:
2474# 			indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples]
2475# 			C = self.standardization.covar[indices,:][:,indices]
2476			C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples])
2477			X = [self.samples[sample][f'D{self._4x}'] for sample in samples]
2478			return correlated_sum(X, C, weights)
2479		except ValueError:
2480			return (0., 0.)
2481
2482
2483	def sample_D4x_covar(self, sample1, sample2 = None):
2484		'''
2485		Covariance between Δ4x values of samples
2486
2487		Returns the error covariance between the average Δ4x values of two
2488		samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`),
2489		returns the Δ4x variance for that sample.
2490		'''
2491		if sample2 is None:
2492			sample2 = sample1
2493		if self.standardization_method == 'pooled':
2494			i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}')
2495			j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}')
2496			return self.standardization.covar[i, j]
2497		elif self.standardization_method == 'indep_sessions':
2498			if sample1 == sample2:
2499				return self.samples[sample1][f'SE_D{self._4x}']**2
2500			else:
2501				c = 0
2502				for session in self.sessions:
2503					sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1]
2504					sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2]
2505					if sdata1 and sdata2:
2506						a = self.sessions[session]['a']
2507						# !! TODO: CM below does not account for temporal changes in standardization parameters
2508						CM = self.sessions[session]['CM'][:3,:3]
2509						avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1])
2510						avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1])
2511						avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2])
2512						avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2])
2513						c += (
2514							self.unknowns[sample1][f'session_D{self._4x}'][session][2]
2515							* self.unknowns[sample2][f'session_D{self._4x}'][session][2]
2516							* np.array([[avg_D4x_1, avg_d4x_1, 1]])
2517							@ CM
2518							@ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T
2519							) / a**2
2520				return float(c)
2521
2522	def sample_D4x_correl(self, sample1, sample2 = None):
2523		'''
2524		Correlation between Δ4x errors of samples
2525
2526		Returns the error correlation between the average Δ4x values of two samples.
2527		'''
2528		if sample2 is None or sample2 == sample1:
2529			return 1.
2530		return (
2531			self.sample_D4x_covar(sample1, sample2)
2532			/ self.unknowns[sample1][f'SE_D{self._4x}']
2533			/ self.unknowns[sample2][f'SE_D{self._4x}']
2534			)
2535
2536	def plot_single_session(self,
2537		session,
2538		kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4),
2539		kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4),
2540		kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75),
2541		kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75),
2542		kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75),
2543		xylimits = 'free', # | 'constant'
2544		x_label = None,
2545		y_label = None,
2546		error_contour_interval = 'auto',
2547		fig = 'new',
2548		):
2549		'''
2550		Generate plot for a single session
2551		'''
2552		if x_label is None:
2553			x_label = f'δ$_{{{self._4x}}}$ (‰)'
2554		if y_label is None:
2555			y_label = f'Δ$_{{{self._4x}}}$ (‰)'
2556
2557		out = _SessionPlot()
2558		anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]]
2559		unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]]
2560		
2561		if fig == 'new':
2562			out.fig = ppl.figure(figsize = (6,6))
2563			ppl.subplots_adjust(.1,.1,.9,.9)
2564
2565		out.anchor_analyses, = ppl.plot(
2566			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2567			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2568			**kw_plot_anchors)
2569		out.unknown_analyses, = ppl.plot(
2570			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2571			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2572			**kw_plot_unknowns)
2573		out.anchor_avg = ppl.plot(
2574			np.array([ np.array([
2575				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2576				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2577				]) for sample in anchors]).T,
2578			np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T,
2579			**kw_plot_anchor_avg)
2580		out.unknown_avg = ppl.plot(
2581			np.array([ np.array([
2582				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2583				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2584				]) for sample in unknowns]).T,
2585			np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T,
2586			**kw_plot_unknown_avg)
2587		if xylimits == 'constant':
2588			x = [r[f'd{self._4x}'] for r in self]
2589			y = [r[f'D{self._4x}'] for r in self]
2590			x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y)
2591			w, h = x2-x1, y2-y1
2592			x1 -= w/20
2593			x2 += w/20
2594			y1 -= h/20
2595			y2 += h/20
2596			ppl.axis([x1, x2, y1, y2])
2597		elif xylimits == 'free':
2598			x1, x2, y1, y2 = ppl.axis()
2599		else:
2600			x1, x2, y1, y2 = ppl.axis(xylimits)
2601				
2602		if error_contour_interval != 'none':
2603			xi, yi = np.linspace(x1, x2), np.linspace(y1, y2)
2604			XI,YI = np.meshgrid(xi, yi)
2605			SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi])
2606			if error_contour_interval == 'auto':
2607				rng = np.max(SI) - np.min(SI)
2608				if rng <= 0.01:
2609					cinterval = 0.001
2610				elif rng <= 0.03:
2611					cinterval = 0.004
2612				elif rng <= 0.1:
2613					cinterval = 0.01
2614				elif rng <= 0.3:
2615					cinterval = 0.03
2616				elif rng <= 1.:
2617					cinterval = 0.1
2618				else:
2619					cinterval = 0.5
2620			else:
2621				cinterval = error_contour_interval
2622
2623			cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval)
2624			out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error)
2625			out.clabel = ppl.clabel(out.contour)
2626
2627		ppl.xlabel(x_label)
2628		ppl.ylabel(y_label)
2629		ppl.title(session, weight = 'bold')
2630		ppl.grid(alpha = .2)
2631		out.ax = ppl.gca()		
2632
2633		return out
2634
2635	def plot_residuals(
2636		self,
2637		kde = False,
2638		hist = False,
2639		binwidth = 2/3,
2640		dir = 'output',
2641		filename = None,
2642		highlight = [],
2643		colors = None,
2644		figsize = None,
2645		dpi = 100,
2646		yspan = None,
2647		):
2648		'''
2649		Plot residuals of each analysis as a function of time (actually, as a function of
2650		the order of analyses in the `D4xdata` object)
2651
2652		+ `kde`: whether to add a kernel density estimate of residuals
2653		+ `hist`: whether to add a histogram of residuals (incompatible with `kde`)
2654		+ `histbins`: specify bin edges for the histogram
2655		+ `dir`: the directory in which to save the plot
2656		+ `highlight`: a list of samples to highlight
2657		+ `colors`: a dict of `{<sample>: <color>}` for all samples
2658		+ `figsize`: (width, height) of figure
2659		+ `dpi`: resolution for PNG output
2660		+ `yspan`: factor controlling the range of y values shown in plot
2661		  (by default: `yspan = 1.5 if kde else 1.0`)
2662		'''
2663		
2664		from matplotlib import ticker
2665
2666		if yspan is None:
2667			if kde:
2668				yspan = 1.5
2669			else:
2670				yspan = 1.0
2671		
2672		# Layout
2673		fig = ppl.figure(figsize = (8,4) if figsize is None else figsize)
2674		if hist or kde:
2675			ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72)
2676			ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15)
2677		else:
2678			ppl.subplots_adjust(.08,.05,.78,.8)
2679			ax1 = ppl.subplot(111)
2680		
2681		# Colors
2682		N = len(self.anchors)
2683		if colors is None:
2684			if len(highlight) > 0:
2685				Nh = len(highlight)
2686				if Nh == 1:
2687					colors = {highlight[0]: (0,0,0)}
2688				elif Nh == 3:
2689					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])}
2690				elif Nh == 4:
2691					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2692				else:
2693					colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)}
2694			else:
2695				if N == 3:
2696					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])}
2697				elif N == 4:
2698					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2699				else:
2700					colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)}
2701
2702		ppl.sca(ax1)
2703		
2704		ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75)
2705
2706		ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$'))
2707
2708		session = self[0]['Session']
2709		x1 = 0
2710# 		ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self])
2711		x_sessions = {}
2712		one_or_more_singlets = False
2713		one_or_more_multiplets = False
2714		multiplets = set()
2715		for k,r in enumerate(self):
2716			if r['Session'] != session:
2717				x2 = k-1
2718				x_sessions[session] = (x1+x2)/2
2719				ppl.axvline(k - 0.5, color = 'k', lw = .5)
2720				session = r['Session']
2721				x1 = k
2722			singlet = len(self.samples[r['Sample']]['data']) == 1
2723			if not singlet:
2724				multiplets.add(r['Sample'])
2725			if r['Sample'] in self.unknowns:
2726				if singlet:
2727					one_or_more_singlets = True
2728				else:
2729					one_or_more_multiplets = True
2730			kw = dict(
2731				marker = 'x' if singlet else '+',
2732				ms = 4 if singlet else 5,
2733				ls = 'None',
2734				mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0),
2735				mew = 1,
2736				alpha = 0.2 if singlet else 1,
2737				)
2738			if highlight and r['Sample'] not in highlight:
2739				kw['alpha'] = 0.2
2740			ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw)
2741		x2 = k
2742		x_sessions[session] = (x1+x2)/2
2743
2744		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1)
2745		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1)
2746		if not (hist or kde):
2747			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center')
2748			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f"   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center')
2749
2750		xmin, xmax, ymin, ymax = ppl.axis()
2751		if yspan != 1:
2752			ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2
2753		for s in x_sessions:
2754			ppl.text(
2755				x_sessions[s],
2756				ymax +1,
2757				s,
2758				va = 'bottom',
2759				**(
2760					dict(ha = 'center')
2761					if len(self.sessions[s]['data']) > (0.15 * len(self))
2762					else dict(ha = 'left', rotation = 45)
2763					)
2764				)
2765
2766		if hist or kde:
2767			ppl.sca(ax2)
2768
2769		for s in colors:
2770			kw['marker'] = '+'
2771			kw['ms'] = 5
2772			kw['mec'] = colors[s]
2773			kw['label'] = s
2774			kw['alpha'] = 1
2775			ppl.plot([], [], **kw)
2776
2777		kw['mec'] = (0,0,0)
2778
2779		if one_or_more_singlets:
2780			kw['marker'] = 'x'
2781			kw['ms'] = 4
2782			kw['alpha'] = .2
2783			kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other'
2784			ppl.plot([], [], **kw)
2785
2786		if one_or_more_multiplets:
2787			kw['marker'] = '+'
2788			kw['ms'] = 4
2789			kw['alpha'] = 1
2790			kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other'
2791			ppl.plot([], [], **kw)
2792
2793		if hist or kde:
2794			leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9)
2795		else:
2796			leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5)
2797		leg.set_zorder(-1000)
2798
2799		ppl.sca(ax1)
2800
2801		ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)')
2802		ppl.xticks([])
2803		ppl.axis([-1, len(self), None, None])
2804
2805		if hist or kde:
2806			ppl.sca(ax2)
2807			X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors])
2808
2809			if kde:
2810				from scipy.stats import gaussian_kde
2811				yi = np.linspace(ymin, ymax, 201)
2812				xi = gaussian_kde(X).evaluate(yi)
2813				ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1))
2814# 				ppl.plot(xi, yi, 'k-', lw = 1)
2815			elif hist:
2816				ppl.hist(
2817					X,
2818					orientation = 'horizontal',
2819					histtype = 'stepfilled',
2820					ec = [.4]*3,
2821					fc = [.25]*3,
2822					alpha = .25,
2823					bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)),
2824					)
2825			ppl.text(0, 0,
2826				f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm",
2827				size = 7.5,
2828				alpha = 1,
2829				va = 'center',
2830				ha = 'left',
2831				)
2832
2833			ppl.axis([0, None, ymin, ymax])
2834			ppl.xticks([])
2835			ppl.yticks([])
2836# 			ax2.spines['left'].set_visible(False)
2837			ax2.spines['right'].set_visible(False)
2838			ax2.spines['top'].set_visible(False)
2839			ax2.spines['bottom'].set_visible(False)
2840
2841		ax1.axis([None, None, ymin, ymax])
2842
2843		if not os.path.exists(dir):
2844			os.makedirs(dir)
2845		if filename is None:
2846			return fig
2847		elif filename == '':
2848			filename = f'D{self._4x}_residuals.pdf'
2849		ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2850		ppl.close(fig)
2851				
2852
2853	def simulate(self, *args, **kwargs):
2854		'''
2855		Legacy function with warning message pointing to `virtual_data()`
2856		'''
2857		raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()')
2858
2859	def plot_distribution_of_analyses(
2860		self,
2861		dir = 'output',
2862		filename = None,
2863		vs_time = False,
2864		figsize = (6,4),
2865		subplots_adjust = (0.02, 0.13, 0.85, 0.8),
2866		output = None,
2867		dpi = 100,
2868		):
2869		'''
2870		Plot temporal distribution of all analyses in the data set.
2871		
2872		**Parameters**
2873
2874		+ `dir`: the directory in which to save the plot
2875		+ `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially.
2876		+ `dpi`: resolution for PNG output
2877		+ `figsize`: (width, height) of figure
2878		+ `dpi`: resolution for PNG output
2879		'''
2880
2881		asamples = [s for s in self.anchors]
2882		usamples = [s for s in self.unknowns]
2883		if output is None or output == 'fig':
2884			fig = ppl.figure(figsize = figsize)
2885			ppl.subplots_adjust(*subplots_adjust)
2886		Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2887		Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2888		Xmax += (Xmax-Xmin)/40
2889		Xmin -= (Xmax-Xmin)/41
2890		for k, s in enumerate(asamples + usamples):
2891			if vs_time:
2892				X = [r['TimeTag'] for r in self if r['Sample'] == s]
2893			else:
2894				X = [x for x,r in enumerate(self) if r['Sample'] == s]
2895			Y = [-k for x in X]
2896			ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75)
2897			ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25)
2898			ppl.text(Xmax, -k, f'   {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r')
2899		ppl.axis([Xmin, Xmax, -k-1, 1])
2900		ppl.xlabel('\ntime')
2901		ppl.gca().annotate('',
2902			xy = (0.6, -0.02),
2903			xycoords = 'axes fraction',
2904			xytext = (.4, -0.02), 
2905            arrowprops = dict(arrowstyle = "->", color = 'k'),
2906            )
2907			
2908
2909		x2 = -1
2910		for session in self.sessions:
2911			x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2912			if vs_time:
2913				ppl.axvline(x1, color = 'k', lw = .75)
2914			if x2 > -1:
2915				if not vs_time:
2916					ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5)
2917			x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2918# 			from xlrd import xldate_as_datetime
2919# 			print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0))
2920			if vs_time:
2921				ppl.axvline(x2, color = 'k', lw = .75)
2922				ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15)
2923			ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8)
2924
2925		ppl.xticks([])
2926		ppl.yticks([])
2927
2928		if output is None:
2929			if not os.path.exists(dir):
2930				os.makedirs(dir)
2931			if filename == None:
2932				filename = f'D{self._4x}_distribution_of_analyses.pdf'
2933			ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2934			ppl.close(fig)
2935		elif output == 'ax':
2936			return ppl.gca()
2937		elif output == 'fig':
2938			return fig
2939
2940
2941	def plot_bulk_compositions(
2942		self,
2943		samples = None,
2944		dir = 'output/bulk_compositions',
2945		figsize = (6,6),
2946		subplots_adjust = (0.15, 0.12, 0.95, 0.92),
2947		show = False,
2948		sample_color = (0,.5,1),
2949		analysis_color = (.7,.7,.7),
2950		labeldist = 0.3,
2951		radius = 0.05,
2952		):
2953		'''
2954		Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses.
2955		
2956		By default, creates a directory `./output/bulk_compositions` where plots for
2957		each sample are saved. Another plot named `__all__.pdf` shows all analyses together.
2958		
2959		
2960		**Parameters**
2961
2962		+ `samples`: Only these samples are processed (by default: all samples).
2963		+ `dir`: where to save the plots
2964		+ `figsize`: (width, height) of figure
2965		+ `subplots_adjust`: passed to `subplots_adjust()`
2966		+ `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples,
2967		allowing for interactive visualization/exploration in (δ13C, δ18O) space.
2968		+ `sample_color`: color used for replicate markers/labels
2969		+ `analysis_color`: color used for sample markers/labels
2970		+ `labeldist`: distance (in inches) from replicate markers to replicate labels
2971		+ `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`.
2972		'''
2973
2974		from matplotlib.patches import Ellipse
2975
2976		if samples is None:
2977			samples = [_ for _ in self.samples]
2978
2979		saved = {}
2980
2981		for s in samples:
2982
2983			fig = ppl.figure(figsize = figsize)
2984			fig.subplots_adjust(*subplots_adjust)
2985			ax = ppl.subplot(111)
2986			ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
2987			ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
2988			ppl.title(s)
2989
2990
2991			XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']])
2992			UID = [_['UID'] for _ in self.samples[s]['data']]
2993			XY0 = XY.mean(0)
2994
2995			for xy in XY:
2996				ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color)
2997				
2998			ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color)
2999			ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color)
3000			ppl.text(*XY0, f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3001			saved[s] = [XY, XY0]
3002			
3003			x1, x2, y1, y2 = ppl.axis()
3004			x0, dx = (x1+x2)/2, (x2-x1)/2
3005			y0, dy = (y1+y2)/2, (y2-y1)/2
3006			dx, dy = [max(max(dx, dy), radius)]*2
3007
3008			ppl.axis([
3009				x0 - 1.2*dx,
3010				x0 + 1.2*dx,
3011				y0 - 1.2*dy,
3012				y0 + 1.2*dy,
3013				])			
3014
3015			XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0))
3016
3017			for xy, uid in zip(XY, UID):
3018
3019				xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy))
3020				vector_in_display_space = xy_in_display_space - XY0_in_display_space
3021
3022				if (vector_in_display_space**2).sum() > 0:
3023
3024					unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5
3025					label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist
3026					label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space
3027					label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space))
3028
3029					ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color)
3030
3031				else:
3032
3033					ppl.text(*xy, f'{uid}  ', va = 'center', ha = 'right', color = analysis_color)
3034
3035			if radius:
3036				ax.add_artist(Ellipse(
3037					xy = XY0,
3038					width = radius*2,
3039					height = radius*2,
3040					ls = (0, (2,2)),
3041					lw = .7,
3042					ec = analysis_color,
3043					fc = 'None',
3044					))
3045				ppl.text(
3046					XY0[0],
3047					XY0[1]-radius,
3048					f'\n± {radius*1e3:.0f} ppm',
3049					color = analysis_color,
3050					va = 'top',
3051					ha = 'center',
3052					linespacing = 0.4,
3053					size = 8,
3054					)
3055
3056			if not os.path.exists(dir):
3057				os.makedirs(dir)
3058			fig.savefig(f'{dir}/{s}.pdf')
3059			ppl.close(fig)
3060
3061		fig = ppl.figure(figsize = figsize)
3062		fig.subplots_adjust(*subplots_adjust)
3063		ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
3064		ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
3065
3066		for s in saved:
3067			for xy in saved[s][0]:
3068				ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color)
3069			ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color)
3070			ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color)
3071			ppl.text(*saved[s][1], f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3072
3073		x1, x2, y1, y2 = ppl.axis()
3074		ppl.axis([
3075			x1 - (x2-x1)/10,
3076			x2 + (x2-x1)/10,
3077			y1 - (y2-y1)/10,
3078			y2 + (y2-y1)/10,
3079			])			
3080
3081
3082		if not os.path.exists(dir):
3083			os.makedirs(dir)
3084		fig.savefig(f'{dir}/__all__.pdf')
3085		if show:
3086			ppl.show()
3087		ppl.close(fig)
3088		
3089
3090	def _save_D4x_correl(
3091		self,
3092		samples = None,
3093		dir = 'output',
3094		filename = None,
3095		D4x_precision = 4,
3096		correl_precision = 4,
3097		):
3098		'''
3099		Save D4x values along with their SE and correlation matrix.
3100
3101		**Parameters**
3102
3103		+ `samples`: Only these samples are output (by default: all samples).
3104		+ `dir`: the directory in which to save the faile (by defaut: `output`)
3105		+ `filename`: the name to the csv file to write to (by default: `D4x_correl.csv`)
3106		+ `D4x_precision`: the precision to use when writing `D4x` and `D4x_SE` values (by default: 4)
3107		+ `correl_precision`: the precision to use when writing correlation factor values (by default: 4)
3108		'''
3109		if samples is None:
3110			samples = sorted([s for s in self.unknowns])
3111		
3112		out = [['Sample']] + [[s] for s in samples]
3113		out[0] += [f'D{self._4x}', f'D{self._4x}_SE', f'D{self._4x}_correl']
3114		for k,s in enumerate(samples):
3115			out[k+1] += [f'{self.samples[s][f"D{self._4x}"]:.4f}', f'{self.samples[s][f"SE_D{self._4x}"]:.4f}']
3116			for s2 in samples:
3117				out[k+1] += [f'{self.sample_D4x_correl(s,s2):.4f}']
3118		
3119		if not os.path.exists(dir):
3120			os.makedirs(dir)
3121		if filename is None:
3122			filename = f'D{self._4x}_correl.csv'
3123		with open(f'{dir}/{filename}', 'w') as fid:
3124			fid.write(make_csv(out))
3125		
3126		
3127		
3128
3129class D47data(D4xdata):
3130	'''
3131	Store and process data for a large set of Δ47 analyses,
3132	usually comprising more than one analytical session.
3133	'''
3134
3135	Nominal_D4x = {
3136		'ETH-1':   0.2052,
3137		'ETH-2':   0.2085,
3138		'ETH-3':   0.6132,
3139		'ETH-4':   0.4511,
3140		'IAEA-C1': 0.3018,
3141		'IAEA-C2': 0.6409,
3142		'MERCK':   0.5135,
3143		} # I-CDES (Bernasconi et al., 2021)
3144	'''
3145	Nominal Δ47 values assigned to the Δ47 anchor samples, used by
3146	`D47data.standardize()` to normalize unknown samples to an absolute Δ47
3147	reference frame.
3148
3149	By default equal to (after [Bernasconi et al. (2021)](https://doi.org/10.1029/2020GC009588)):
3150	```py
3151	{
3152		'ETH-1'   : 0.2052,
3153		'ETH-2'   : 0.2085,
3154		'ETH-3'   : 0.6132,
3155		'ETH-4'   : 0.4511,
3156		'IAEA-C1' : 0.3018,
3157		'IAEA-C2' : 0.6409,
3158		'MERCK'   : 0.5135,
3159	}
3160	```
3161	'''
3162
3163
3164	@property
3165	def Nominal_D47(self):
3166		return self.Nominal_D4x
3167	
3168
3169	@Nominal_D47.setter
3170	def Nominal_D47(self, new):
3171		self.Nominal_D4x = dict(**new)
3172		self.refresh()
3173
3174
3175	def __init__(self, l = [], **kwargs):
3176		'''
3177		**Parameters:** same as `D4xdata.__init__()`
3178		'''
3179		D4xdata.__init__(self, l = l, mass = '47', **kwargs)
3180
3181
3182	def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'):
3183		'''
3184		Find all samples for which `Teq` is specified, compute equilibrium Δ47
3185		value for that temperature, and add treat these samples as additional anchors.
3186
3187		**Parameters**
3188
3189		+ `fCo2eqD47`: Which CO2 equilibrium law to use
3190		(`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127);
3191		`wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)).
3192		+ `priority`: if `replace`: forget old anchors and only use the new ones;
3193		if `new`: keep pre-existing anchors but update them in case of conflict
3194		between old and new Δ47 values;
3195		if `old`: keep pre-existing anchors but preserve their original Δ47
3196		values in case of conflict.
3197		'''
3198		f = {
3199			'petersen': fCO2eqD47_Petersen,
3200			'wang': fCO2eqD47_Wang,
3201			}[fCo2eqD47]
3202		foo = {}
3203		for r in self:
3204			if 'Teq' in r:
3205				if r['Sample'] in foo:
3206					assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.'
3207				else:
3208					foo[r['Sample']] = f(r['Teq'])
3209			else:
3210					assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.'
3211
3212		if priority == 'replace':
3213			self.Nominal_D47 = {}
3214		for s in foo:
3215			if priority != 'old' or s not in self.Nominal_D47:
3216				self.Nominal_D47[s] = foo[s]
3217	
3218	def save_D47_correl(self, *args, **kwargs):
3219		return self._save_D4x_correl(*args, **kwargs)
3220
3221	save_D47_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D47')
3222
3223
3224class D48data(D4xdata):
3225	'''
3226	Store and process data for a large set of Δ48 analyses,
3227	usually comprising more than one analytical session.
3228	'''
3229
3230	Nominal_D4x = {
3231		'ETH-1':  0.138,
3232		'ETH-2':  0.138,
3233		'ETH-3':  0.270,
3234		'ETH-4':  0.223,
3235		'GU-1':  -0.419,
3236		} # (Fiebig et al., 2019, 2021)
3237	'''
3238	Nominal Δ48 values assigned to the Δ48 anchor samples, used by
3239	`D48data.standardize()` to normalize unknown samples to an absolute Δ48
3240	reference frame.
3241
3242	By default equal to (after [Fiebig et al. (2019)](https://doi.org/10.1016/j.chemgeo.2019.05.019),
3243	[Fiebig et al. (2021)](https://doi.org/10.1016/j.gca.2021.07.012)):
3244
3245	```py
3246	{
3247		'ETH-1' :  0.138,
3248		'ETH-2' :  0.138,
3249		'ETH-3' :  0.270,
3250		'ETH-4' :  0.223,
3251		'GU-1'  : -0.419,
3252	}
3253	```
3254	'''
3255
3256
3257	@property
3258	def Nominal_D48(self):
3259		return self.Nominal_D4x
3260
3261	
3262	@Nominal_D48.setter
3263	def Nominal_D48(self, new):
3264		self.Nominal_D4x = dict(**new)
3265		self.refresh()
3266
3267
3268	def __init__(self, l = [], **kwargs):
3269		'''
3270		**Parameters:** same as `D4xdata.__init__()`
3271		'''
3272		D4xdata.__init__(self, l = l, mass = '48', **kwargs)
3273
3274	def save_D48_correl(self, *args, **kwargs):
3275		return self._save_D4x_correl(*args, **kwargs)
3276
3277	save_D48_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D48')
3278
3279
3280class D49data(D4xdata):
3281	'''
3282	Store and process data for a large set of Δ49 analyses,
3283	usually comprising more than one analytical session.
3284	'''
3285	
3286	Nominal_D4x = {"1000C": 0.0, "25C": 2.228}  # Wang 2004
3287	'''
3288	Nominal Δ49 values assigned to the Δ49 anchor samples, used by
3289	`D49data.standardize()` to normalize unknown samples to an absolute Δ49
3290	reference frame.
3291
3292	By default equal to (after [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)):
3293
3294	```py
3295	{
3296		"1000C": 0.0,
3297		"25C": 2.228
3298	}
3299	```
3300	'''
3301	
3302	@property
3303	def Nominal_D49(self):
3304		return self.Nominal_D4x
3305	
3306	@Nominal_D49.setter
3307	def Nominal_D49(self, new):
3308		self.Nominal_D4x = dict(**new)
3309		self.refresh()
3310	
3311	def __init__(self, l=[], **kwargs):
3312		'''
3313		**Parameters:** same as `D4xdata.__init__()`
3314		'''
3315		D4xdata.__init__(self, l=l, mass='49', **kwargs)
3316	
3317	def save_D49_correl(self, *args, **kwargs):
3318		return self._save_D4x_correl(*args, **kwargs)
3319	
3320	save_D49_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D49')
3321
3322class _SessionPlot():
3323	'''
3324	Simple placeholder class
3325	'''
3326	def __init__(self):
3327		pass
3328
3329_app = typer.Typer(
3330	add_completion = False,
3331	context_settings={'help_option_names': ['-h', '--help']},
3332	rich_markup_mode = 'rich',
3333	)
3334
3335@_app.command()
3336def _cli(
3337	rawdata: Annotated[str, typer.Argument(help = "Specify the path of a rawdata input file")],
3338	exclude: Annotated[str, typer.Option('--exclude', '-e', help = 'The path of a file specifying UIDs and/or Samples to exclude')] = 'none',
3339	anchors: Annotated[str, typer.Option('--anchors', '-a', help = 'The path of a file specifying custom anchors')] = 'none',
3340	output_dir: Annotated[str, typer.Option('--output-dir', '-o', help = 'Specify the output directory')] = 'output',
3341	run_D48: Annotated[bool, typer.Option('--D48', help = 'Also standardize D48')] = False,
3342	):
3343	"""
3344	Process raw D47 data and return standardized results.
3345	
3346	See [b]https://mdaeron.github.io/D47crunch/#3-command-line-interface-cli[/b] for more details.
3347	
3348	Reads raw data from an input file, optionally excluding some samples and/or analyses, thean standardizes
3349	the data based either on the default [b]d13C_VDPB[/b], [b]d18O_VPDB[/b], [b]D47[/b], and [b]D48[/b] anchors or on different
3350	user-specified anchors. A new directory (named `output` by default) is created to store the results and
3351	the following sequence is applied:
3352	
3353	* [b]D47data.wg()[/b]
3354	* [b]D47data.crunch()[/b]
3355	* [b]D47data.standardize()[/b]
3356	* [b]D47data.summary()[/b]
3357	* [b]D47data.table_of_samples()[/b]
3358	* [b]D47data.table_of_sessions()[/b]
3359	* [b]D47data.plot_sessions()[/b]
3360	* [b]D47data.plot_residuals()[/b]
3361	* [b]D47data.table_of_analyses()[/b]
3362	* [b]D47data.plot_distribution_of_analyses()[/b]
3363	* [b]D47data.plot_bulk_compositions()[/b]
3364	* [b]D47data.save_D47_correl()[/b]
3365	
3366	Optionally, also apply similar methods for [b]]D48[/b].
3367	
3368	[b]Example CSV file for --anchors option:[/b]	
3369	[i]
3370	Sample,  d13C_VPDB,  d18O_VPDB,     D47,    D48
3371	ETH-1,        2.02,      -2.19,  0.2052,  0.138
3372	ETH-2,      -10.17,     -18.69,  0.2085,  0.138
3373	ETH-3,        1.71,      -1.78,  0.6132,  0.270
3374	ETH-4,            ,           ,  0.4511,  0.223
3375	[/i]
3376	Except for [i]Sample[/i], none of the columns above are mandatory.
3377
3378	[b]Example CSV file for --exclude option:[/b]	
3379	[i]
3380	Sample,  UID
3381	 FOO-1,
3382	 BAR-2,
3383	      ,  A04
3384	      ,  A17
3385	      ,  A88
3386	[/i]
3387	This will exclude all analyses of samples [i]FOO-1[/i] and [i]BAR-2[/i],
3388	and the analyses with UIDs [i]A04[/i], [i]A17[/i], and [i]A88[/i].
3389	Neither column is mandatory.
3390	"""
3391
3392	data = D47data()
3393	data.read(rawdata)
3394
3395	if exclude != 'none':
3396		exclude = read_csv(exclude)
3397		exclude_uid = {r['UID'] for r in exclude if 'UID' in r}
3398		exclude_sample = {r['Sample'] for r in exclude if 'Sample' in r}
3399	else:
3400		exclude_uid = []
3401		exclude_sample = []
3402	
3403	data = D47data([r for r in data if r['UID'] not in exclude_uid and r['Sample'] not in exclude_sample])
3404
3405	if anchors != 'none':
3406		anchors = read_csv(anchors)
3407		if len([_ for _ in anchors if 'd13C_VPDB' in _]):
3408			data.Nominal_d13C_VPDB = {
3409				_['Sample']: _['d13C_VPDB']
3410				for _ in anchors
3411				if 'd13C_VPDB' in _
3412				}
3413		if len([_ for _ in anchors if 'd18O_VPDB' in _]):
3414			data.Nominal_d18O_VPDB = {
3415				_['Sample']: _['d18O_VPDB']
3416				for _ in anchors
3417				if 'd18O_VPDB' in _
3418				}
3419		if len([_ for _ in anchors if 'D47' in _]):
3420			data.Nominal_D4x = {
3421				_['Sample']: _['D47']
3422				for _ in anchors
3423				if 'D47' in _
3424				}
3425
3426	data.refresh()
3427	data.wg()
3428	data.crunch()
3429	data.standardize()
3430	data.summary(dir = output_dir)
3431	data.plot_residuals(dir = output_dir, filename = 'D47_residuals.pdf', kde = True)
3432	data.plot_bulk_compositions(dir = output_dir + '/bulk_compositions')
3433	data.plot_sessions(dir = output_dir)
3434	data.save_D47_correl(dir = output_dir)
3435	
3436	if not run_D48:
3437		data.table_of_samples(dir = output_dir)
3438		data.table_of_analyses(dir = output_dir)
3439		data.table_of_sessions(dir = output_dir)
3440
3441
3442	if run_D48:
3443		data2 = D48data()
3444		print(rawdata)
3445		data2.read(rawdata)
3446
3447		data2 = D48data([r for r in data2 if r['UID'] not in exclude_uid and r['Sample'] not in exclude_sample])
3448
3449		if anchors != 'none':
3450			if len([_ for _ in anchors if 'd13C_VPDB' in _]):
3451				data2.Nominal_d13C_VPDB = {
3452					_['Sample']: _['d13C_VPDB']
3453					for _ in anchors
3454					if 'd13C_VPDB' in _
3455					}
3456			if len([_ for _ in anchors if 'd18O_VPDB' in _]):
3457				data2.Nominal_d18O_VPDB = {
3458					_['Sample']: _['d18O_VPDB']
3459					for _ in anchors
3460					if 'd18O_VPDB' in _
3461					}
3462			if len([_ for _ in anchors if 'D48' in _]):
3463				data2.Nominal_D4x = {
3464					_['Sample']: _['D48']
3465					for _ in anchors
3466					if 'D48' in _
3467					}
3468
3469		data2.refresh()
3470		data2.wg()
3471		data2.crunch()
3472		data2.standardize()
3473		data2.summary(dir = output_dir)
3474		data2.plot_sessions(dir = output_dir)
3475		data2.plot_residuals(dir = output_dir, filename = 'D48_residuals.pdf', kde = True)
3476		data2.plot_distribution_of_analyses(dir = output_dir)
3477		data2.save_D48_correl(dir = output_dir)
3478
3479		table_of_analyses(data, data2, dir = output_dir)
3480		table_of_samples(data, data2, dir = output_dir)
3481		table_of_sessions(data, data2, dir = output_dir)
3482		
3483def __cli():
3484	_app()
def fCO2eqD47_Petersen(T):
68def fCO2eqD47_Petersen(T):
69	'''
70	CO2 equilibrium Δ47 value as a function of T (in degrees C)
71	according to [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127).
72
73	'''
74	return float(_fCO2eqD47_Petersen(T))

CO2 equilibrium Δ47 value as a function of T (in degrees C) according to Petersen et al. (2019).

def fCO2eqD47_Wang(T):
79def fCO2eqD47_Wang(T):
80	'''
81	CO2 equilibrium Δ47 value as a function of `T` (in degrees C)
82	according to [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)
83	(supplementary data of [Dennis et al., 2011](https://doi.org/10.1016/j.gca.2011.09.025)).
84	'''
85	return float(_fCO2eqD47_Wang(T))

CO2 equilibrium Δ47 value as a function of T (in degrees C) according to Wang et al. (2004) (supplementary data of Dennis et al., 2011).

def correlated_sum(X, C, w=None):
 88def correlated_sum(X, C, w = None):
 89	'''
 90	Compute covariance-aware linear combinations
 91
 92	**Parameters**
 93	
 94	+ `X`: list or 1-D array of values to sum
 95	+ `C`: covariance matrix for the elements of `X`
 96	+ `w`: list or 1-D array of weights to apply to the elements of `X`
 97	       (all equal to 1 by default)
 98
 99	Return the sum (and its SE) of the elements of `X`, with optional weights equal
100	to the elements of `w`, accounting for covariances between the elements of `X`.
101	'''
102	if w is None:
103		w = [1 for x in X]
104	return np.dot(w,X), (np.dot(w,np.dot(C,w)))**.5

Compute covariance-aware linear combinations

Parameters

  • X: list or 1-D array of values to sum
  • C: covariance matrix for the elements of X
  • w: list or 1-D array of weights to apply to the elements of X (all equal to 1 by default)

Return the sum (and its SE) of the elements of X, with optional weights equal to the elements of w, accounting for covariances between the elements of X.

def make_csv(x, hsep=',', vsep='\n'):
107def make_csv(x, hsep = ',', vsep = '\n'):
108	'''
109	Formats a list of lists of strings as a CSV
110
111	**Parameters**
112
113	+ `x`: the list of lists of strings to format
114	+ `hsep`: the field separator (`,` by default)
115	+ `vsep`: the line-ending convention to use (`\\n` by default)
116
117	**Example**
118
119	```py
120	print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']]))
121	```
122
123	outputs:
124
125	```py
126	a,b,c
127	d,e,f
128	```
129	'''
130	return vsep.join([hsep.join(l) for l in x])

Formats a list of lists of strings as a CSV

Parameters

  • x: the list of lists of strings to format
  • hsep: the field separator (, by default)
  • vsep: the line-ending convention to use (\n by default)

Example

print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']]))

outputs:

a,b,c
d,e,f
def pf(txt):
133def pf(txt):
134	'''
135	Modify string `txt` to follow `lmfit.Parameter()` naming rules.
136	'''
137	return txt.replace('-','_').replace('.','_').replace(' ','_')

Modify string txt to follow lmfit.Parameter() naming rules.

def smart_type(x):
140def smart_type(x):
141	'''
142	Tries to convert string `x` to a float if it includes a decimal point, or
143	to an integer if it does not. If both attempts fail, return the original
144	string unchanged.
145	'''
146	try:
147		y = float(x)
148	except ValueError:
149		return x
150	if '.' not in x:
151		return int(y)
152	return y

Tries to convert string x to a float if it includes a decimal point, or to an integer if it does not. If both attempts fail, return the original string unchanged.

def pretty_table(x, header=1, hsep=' ', vsep='–', align='<'):
155def pretty_table(x, header = 1, hsep = '  ', vsep = '–', align = '<'):
156	'''
157	Reads a list of lists of strings and outputs an ascii table
158
159	**Parameters**
160
161	+ `x`: a list of lists of strings
162	+ `header`: the number of lines to treat as header lines
163	+ `hsep`: the horizontal separator between columns
164	+ `vsep`: the character to use as vertical separator
165	+ `align`: string of left (`<`) or right (`>`) alignment characters.
166
167	**Example**
168
169	```py
170	x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']]
171	print(pretty_table(x))
172	```
173	yields:	
174	```
175	--  ------  ---
176	A        B    C
177	--  ------  ---
178	1   1.9999  foo
179	10       x  bar
180	--  ------  ---
181	```
182	
183	'''
184	txt = []
185	widths = [np.max([len(e) for e in c]) for c in zip(*x)]
186
187	if len(widths) > len(align):
188		align += '>' * (len(widths)-len(align))
189	sepline = hsep.join([vsep*w for w in widths])
190	txt += [sepline]
191	for k,l in enumerate(x):
192		if k and k == header:
193			txt += [sepline]
194		txt += [hsep.join([f'{e:{a}{w}}' for e, w, a in zip(l, widths, align)])]
195	txt += [sepline]
196	txt += ['']
197	return '\n'.join(txt)

Reads a list of lists of strings and outputs an ascii table

Parameters

  • x: a list of lists of strings
  • header: the number of lines to treat as header lines
  • hsep: the horizontal separator between columns
  • vsep: the character to use as vertical separator
  • align: string of left (<) or right (>) alignment characters.

Example

x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']]
print(pretty_table(x))

yields:

--  ------  ---
A        B    C
--  ------  ---
1   1.9999  foo
10       x  bar
--  ------  ---
def transpose_table(x):
200def transpose_table(x):
201	'''
202	Transpose a list if lists
203
204	**Parameters**
205
206	+ `x`: a list of lists
207
208	**Example**
209
210	```py
211	x = [[1, 2], [3, 4]]
212	print(transpose_table(x)) # yields: [[1, 3], [2, 4]]
213	```
214	'''
215	return [[e for e in c] for c in zip(*x)]

Transpose a list if lists

Parameters

  • x: a list of lists

Example

x = [[1, 2], [3, 4]]
print(transpose_table(x)) # yields: [[1, 3], [2, 4]]
def w_avg(X, sX):
218def w_avg(X, sX) :
219	'''
220	Compute variance-weighted average
221
222	Returns the value and SE of the weighted average of the elements of `X`,
223	with relative weights equal to their inverse variances (`1/sX**2`).
224
225	**Parameters**
226
227	+ `X`: array-like of elements to average
228	+ `sX`: array-like of the corresponding SE values
229
230	**Tip**
231
232	If `X` and `sX` are initially arranged as a list of `(x, sx)` doublets,
233	they may be rearranged using `zip()`:
234
235	```python
236	foo = [(0, 1), (1, 0.5), (2, 0.5)]
237	print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333)
238	```
239	'''
240	X = [ x for x in X ]
241	sX = [ sx for sx in sX ]
242	W = [ sx**-2 for sx in sX ]
243	W = [ w/sum(W) for w in W ]
244	Xavg = sum([ w*x for w,x in zip(W,X) ])
245	sXavg = sum([ w**2*sx**2 for w,sx in zip(W,sX) ])**.5
246	return Xavg, sXavg

Compute variance-weighted average

Returns the value and SE of the weighted average of the elements of X, with relative weights equal to their inverse variances (1/sX**2).

Parameters

  • X: array-like of elements to average
  • sX: array-like of the corresponding SE values

Tip

If X and sX are initially arranged as a list of (x, sx) doublets, they may be rearranged using zip():

foo = [(0, 1), (1, 0.5), (2, 0.5)]
print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333)
def read_csv(filename, sep=''):
249def read_csv(filename, sep = ''):
250	'''
251	Read contents of `filename` in csv format and return a list of dictionaries.
252
253	In the csv string, spaces before and after field separators (`','` by default)
254	are optional.
255
256	**Parameters**
257
258	+ `filename`: the csv file to read
259	+ `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`,
260	whichever appers most often in the contents of `filename`.
261	'''
262	with open(filename) as fid:
263		txt = fid.read()
264
265	if sep == '':
266		sep = sorted(',;\t', key = lambda x: - txt.count(x))[0]
267	txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()]
268	return [{k: smart_type(v) for k,v in zip(txt[0], l) if v} for l in txt[1:]]

Read contents of filename in csv format and return a list of dictionaries.

In the csv string, spaces before and after field separators (',' by default) are optional.

Parameters

  • filename: the csv file to read
  • sep: csv separator delimiting the fields. By default, use ,, ;, or , whichever appers most often in the contents of filename.
def simulate_single_analysis( sample='MYSAMPLE', d13Cwg_VPDB=-4.0, d18Owg_VSMOW=26.0, d13C_VPDB=None, d18O_VPDB=None, D47=None, D48=None, D49=0.0, D17O=0.0, a47=1.0, b47=0.0, c47=-0.9, a48=1.0, b48=0.0, c48=-0.45, Nominal_D47=None, Nominal_D48=None, Nominal_d13C_VPDB=None, Nominal_d18O_VPDB=None, ALPHA_18O_ACID_REACTION=None, R13_VPDB=None, R17_VSMOW=None, R18_VSMOW=None, LAMBDA_17=None, R18_VPDB=None):
271def simulate_single_analysis(
272	sample = 'MYSAMPLE',
273	d13Cwg_VPDB = -4., d18Owg_VSMOW = 26.,
274	d13C_VPDB = None, d18O_VPDB = None,
275	D47 = None, D48 = None, D49 = 0., D17O = 0.,
276	a47 = 1., b47 = 0., c47 = -0.9,
277	a48 = 1., b48 = 0., c48 = -0.45,
278	Nominal_D47 = None,
279	Nominal_D48 = None,
280	Nominal_d13C_VPDB = None,
281	Nominal_d18O_VPDB = None,
282	ALPHA_18O_ACID_REACTION = None,
283	R13_VPDB = None,
284	R17_VSMOW = None,
285	R18_VSMOW = None,
286	LAMBDA_17 = None,
287	R18_VPDB = None,
288	):
289	'''
290	Compute working-gas delta values for a single analysis, assuming a stochastic working
291	gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values).
292	
293	**Parameters**
294
295	+ `sample`: sample name
296	+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas
297		(respectively –4 and +26 ‰ by default)
298	+ `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample
299	+ `D47`, `D48`, `D49`, `D17O`: clumped-isotope and oxygen-17 anomalies
300		of the carbonate sample
301	+ `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and
302		Δ48 values if `D47` or `D48` are not specified
303	+ `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and
304		δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified
305	+ `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor
306	+ `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17
307		correction parameters (by default equal to the `D4xdata` default values)
308	
309	Returns a dictionary with fields
310	`['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49']`.
311	'''
312
313	if Nominal_d13C_VPDB is None:
314		Nominal_d13C_VPDB = D4xdata().Nominal_d13C_VPDB
315
316	if Nominal_d18O_VPDB is None:
317		Nominal_d18O_VPDB = D4xdata().Nominal_d18O_VPDB
318
319	if ALPHA_18O_ACID_REACTION is None:
320		ALPHA_18O_ACID_REACTION = D4xdata().ALPHA_18O_ACID_REACTION
321
322	if R13_VPDB is None:
323		R13_VPDB = D4xdata().R13_VPDB
324
325	if R17_VSMOW is None:
326		R17_VSMOW = D4xdata().R17_VSMOW
327
328	if R18_VSMOW is None:
329		R18_VSMOW = D4xdata().R18_VSMOW
330
331	if LAMBDA_17 is None:
332		LAMBDA_17 = D4xdata().LAMBDA_17
333
334	if R18_VPDB is None:
335		R18_VPDB = D4xdata().R18_VPDB
336	
337	R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW) ** LAMBDA_17
338	
339	if Nominal_D47 is None:
340		Nominal_D47 = D47data().Nominal_D47
341
342	if Nominal_D48 is None:
343		Nominal_D48 = D48data().Nominal_D48
344	
345	if d13C_VPDB is None:
346		if sample in Nominal_d13C_VPDB:
347			d13C_VPDB = Nominal_d13C_VPDB[sample]
348		else:
349			raise KeyError(f"Sample {sample} is missing d13C_VDP value, and it is not defined in Nominal_d13C_VDP.")
350
351	if d18O_VPDB is None:
352		if sample in Nominal_d18O_VPDB:
353			d18O_VPDB = Nominal_d18O_VPDB[sample]
354		else:
355			raise KeyError(f"Sample {sample} is missing d18O_VPDB value, and it is not defined in Nominal_d18O_VPDB.")
356
357	if D47 is None:
358		if sample in Nominal_D47:
359			D47 = Nominal_D47[sample]
360		else:
361			raise KeyError(f"Sample {sample} is missing D47 value, and it is not defined in Nominal_D47.")
362
363	if D48 is None:
364		if sample in Nominal_D48:
365			D48 = Nominal_D48[sample]
366		else:
367			raise KeyError(f"Sample {sample} is missing D48 value, and it is not defined in Nominal_D48.")
368
369	X = D4xdata()
370	X.R13_VPDB = R13_VPDB
371	X.R17_VSMOW = R17_VSMOW
372	X.R18_VSMOW = R18_VSMOW
373	X.LAMBDA_17 = LAMBDA_17
374	X.R18_VPDB = R18_VPDB
375	X.R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW)**LAMBDA_17
376
377	R45wg, R46wg, R47wg, R48wg, R49wg = X.compute_isobar_ratios(
378		R13 = R13_VPDB * (1 + d13Cwg_VPDB/1000),
379		R18 = R18_VSMOW * (1 + d18Owg_VSMOW/1000),
380		)
381	R45, R46, R47, R48, R49 = X.compute_isobar_ratios(
382		R13 = R13_VPDB * (1 + d13C_VPDB/1000),
383		R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION,
384		D17O=D17O, D47=D47, D48=D48, D49=D49,
385		)
386	R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = X.compute_isobar_ratios(
387		R13 = R13_VPDB * (1 + d13C_VPDB/1000),
388		R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION,
389		D17O=D17O,
390		)
391	
392	d45 = 1000 * (R45/R45wg - 1)
393	d46 = 1000 * (R46/R46wg - 1)
394	d47 = 1000 * (R47/R47wg - 1)
395	d48 = 1000 * (R48/R48wg - 1)
396	d49 = 1000 * (R49/R49wg - 1)
397
398	for k in range(3): # dumb iteration to adjust for small changes in d47
399		R47raw = (1 + (a47 * D47 + b47 * d47 + c47)/1000) * R47stoch
400		R48raw = (1 + (a48 * D48 + b48 * d48 + c48)/1000) * R48stoch	
401		d47 = 1000 * (R47raw/R47wg - 1)
402		d48 = 1000 * (R48raw/R48wg - 1)
403
404	return dict(
405		Sample = sample,
406		D17O = D17O,
407		d13Cwg_VPDB = d13Cwg_VPDB,
408		d18Owg_VSMOW = d18Owg_VSMOW,
409		d45 = d45,
410		d46 = d46,
411		d47 = d47,
412		d48 = d48,
413		d49 = d49,
414		)

Compute working-gas delta values for a single analysis, assuming a stochastic working gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values).

Parameters

  • sample: sample name
  • d13Cwg_VPDB, d18Owg_VSMOW: bulk composition of the working gas (respectively –4 and +26 ‰ by default)
  • d13C_VPDB, d18O_VPDB: bulk composition of the carbonate sample
  • D47, D48, D49, D17O: clumped-isotope and oxygen-17 anomalies of the carbonate sample
  • Nominal_D47, Nominal_D48: where to lookup Δ47 and Δ48 values if D47 or D48 are not specified
  • Nominal_d13C_VPDB, Nominal_d18O_VPDB: where to lookup δ13C and δ18O values if d13C_VPDB or d18O_VPDB are not specified
  • ALPHA_18O_ACID_REACTION: 18O/16O acid fractionation factor
  • R13_VPDB, R17_VSMOW, R18_VSMOW, LAMBDA_17, R18_VPDB: oxygen-17 correction parameters (by default equal to the D4xdata default values)

Returns a dictionary with fields ['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49'].

def virtual_data( samples=[], a47=1.0, b47=0.0, c47=-0.9, a48=1.0, b48=0.0, c48=-0.45, rd45=0.02, rd46=0.06, rD47=0.015, rD48=0.045, d13Cwg_VPDB=None, d18Owg_VSMOW=None, session=None, Nominal_D47=None, Nominal_D48=None, Nominal_d13C_VPDB=None, Nominal_d18O_VPDB=None, ALPHA_18O_ACID_REACTION=None, R13_VPDB=None, R17_VSMOW=None, R18_VSMOW=None, LAMBDA_17=None, R18_VPDB=None, seed=0, shuffle=True):
417def virtual_data(
418	samples = [],
419	a47 = 1., b47 = 0., c47 = -0.9,
420	a48 = 1., b48 = 0., c48 = -0.45,
421	rd45 = 0.020, rd46 = 0.060,
422	rD47 = 0.015, rD48 = 0.045,
423	d13Cwg_VPDB = None, d18Owg_VSMOW = None,
424	session = None,
425	Nominal_D47 = None, Nominal_D48 = None,
426	Nominal_d13C_VPDB = None, Nominal_d18O_VPDB = None,
427	ALPHA_18O_ACID_REACTION = None,
428	R13_VPDB = None,
429	R17_VSMOW = None,
430	R18_VSMOW = None,
431	LAMBDA_17 = None,
432	R18_VPDB = None,
433	seed = 0,
434	shuffle = True,
435	):
436	'''
437	Return list with simulated analyses from a single session.
438	
439	**Parameters**
440	
441	+ `samples`: a list of entries; each entry is a dictionary with the following fields:
442	    * `Sample`: the name of the sample
443	    * `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample
444	    * `D47`, `D48`, `D49`, `D17O` (all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sample
445	    * `N`: how many analyses to generate for this sample
446	+ `a47`: scrambling factor for Δ47
447	+ `b47`: compositional nonlinearity for Δ47
448	+ `c47`: working gas offset for Δ47
449	+ `a48`: scrambling factor for Δ48
450	+ `b48`: compositional nonlinearity for Δ48
451	+ `c48`: working gas offset for Δ48
452	+ `rd45`: analytical repeatability of δ45
453	+ `rd46`: analytical repeatability of δ46
454	+ `rD47`: analytical repeatability of Δ47
455	+ `rD48`: analytical repeatability of Δ48
456	+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas
457		(by default equal to the `simulate_single_analysis` default values)
458	+ `session`: name of the session (no name by default)
459	+ `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and Δ48 values
460		if `D47` or `D48` are not specified (by default equal to the `simulate_single_analysis` defaults)
461	+ `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and
462		δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 
463		(by default equal to the `simulate_single_analysis` defaults)
464	+ `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor
465		(by default equal to the `simulate_single_analysis` defaults)
466	+ `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17
467		correction parameters (by default equal to the `simulate_single_analysis` default)
468	+ `seed`: explicitly set to a non-zero value to achieve random but repeatable simulations
469	+ `shuffle`: randomly reorder the sequence of analyses
470	
471		
472	Here is an example of using this method to generate an arbitrary combination of
473	anchors and unknowns for a bunch of sessions:
474
475	```py
476	.. include:: ../code_examples/virtual_data/example.py
477	```
478	
479	This should output something like:
480	
481	```
482	.. include:: ../code_examples/virtual_data/output.txt
483	```
484	'''
485	
486	kwargs = locals().copy()
487
488	from numpy import random as nprandom
489	if seed:
490		rng = nprandom.default_rng(seed)
491	else:
492		rng = nprandom.default_rng()
493	
494	N = sum([s['N'] for s in samples])
495	errors45 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
496	errors45 *= rd45 / stdev(errors45) # scale errors to rd45
497	errors46 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
498	errors46 *= rd46 / stdev(errors46) # scale errors to rd46
499	errors47 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
500	errors47 *= rD47 / stdev(errors47) # scale errors to rD47
501	errors48 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors
502	errors48 *= rD48 / stdev(errors48) # scale errors to rD48
503	
504	k = 0
505	out = []
506	for s in samples:
507		kw = {}
508		kw['sample'] = s['Sample']
509		kw = {
510			**kw,
511			**{var: kwargs[var]
512				for var in [
513					'd13Cwg_VPDB', 'd18Owg_VSMOW', 'ALPHA_18O_ACID_REACTION',
514					'Nominal_D47', 'Nominal_D48', 'Nominal_d13C_VPDB', 'Nominal_d18O_VPDB',
515					'R13_VPDB', 'R17_VSMOW', 'R18_VSMOW', 'LAMBDA_17', 'R18_VPDB',
516					'a47', 'b47', 'c47', 'a48', 'b48', 'c48',
517					]
518				if kwargs[var] is not None},
519			**{var: s[var]
520				for var in ['d13C_VPDB', 'd18O_VPDB', 'D47', 'D48', 'D49', 'D17O']
521				if var in s},
522			}
523
524		sN = s['N']
525		while sN:
526			out.append(simulate_single_analysis(**kw))
527			out[-1]['d45'] += errors45[k]
528			out[-1]['d46'] += errors46[k]
529			out[-1]['d47'] += (errors45[k] + errors46[k] + errors47[k]) * a47
530			out[-1]['d48'] += (2*errors46[k] + errors48[k]) * a48
531			sN -= 1
532			k += 1
533
534		if session is not None:
535			for r in out:
536				r['Session'] = session
537
538		if shuffle:
539			nprandom.shuffle(out)
540
541	return out

Return list with simulated analyses from a single session.

Parameters

  • samples: a list of entries; each entry is a dictionary with the following fields:
    • Sample: the name of the sample
    • d13C_VPDB, d18O_VPDB: bulk composition of the carbonate sample
    • D47, D48, D49, D17O (all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sample
    • N: how many analyses to generate for this sample
  • a47: scrambling factor for Δ47
  • b47: compositional nonlinearity for Δ47
  • c47: working gas offset for Δ47
  • a48: scrambling factor for Δ48
  • b48: compositional nonlinearity for Δ48
  • c48: working gas offset for Δ48
  • rd45: analytical repeatability of δ45
  • rd46: analytical repeatability of δ46
  • rD47: analytical repeatability of Δ47
  • rD48: analytical repeatability of Δ48
  • d13Cwg_VPDB, d18Owg_VSMOW: bulk composition of the working gas (by default equal to the simulate_single_analysis default values)
  • session: name of the session (no name by default)
  • Nominal_D47, Nominal_D48: where to lookup Δ47 and Δ48 values if D47 or D48 are not specified (by default equal to the simulate_single_analysis defaults)
  • Nominal_d13C_VPDB, Nominal_d18O_VPDB: where to lookup δ13C and δ18O values if d13C_VPDB or d18O_VPDB are not specified (by default equal to the simulate_single_analysis defaults)
  • ALPHA_18O_ACID_REACTION: 18O/16O acid fractionation factor (by default equal to the simulate_single_analysis defaults)
  • R13_VPDB, R17_VSMOW, R18_VSMOW, LAMBDA_17, R18_VPDB: oxygen-17 correction parameters (by default equal to the simulate_single_analysis default)
  • seed: explicitly set to a non-zero value to achieve random but repeatable simulations
  • shuffle: randomly reorder the sequence of analyses

Here is an example of using this method to generate an arbitrary combination of anchors and unknowns for a bunch of sessions:

from D47crunch import virtual_data, D47data

args = dict(
    samples = [
        dict(Sample = 'ETH-1', N = 3),
        dict(Sample = 'ETH-2', N = 3),
        dict(Sample = 'ETH-3', N = 3),
        dict(Sample = 'FOO', N = 3,
            d13C_VPDB = -5., d18O_VPDB = -10.,
            D47 = 0.3, D48 = 0.15),
        dict(Sample = 'BAR', N = 3,
            d13C_VPDB = -15., d18O_VPDB = -2.,
            D47 = 0.6, D48 = 0.2),
        ], rD47 = 0.010, rD48 = 0.030)

session1 = virtual_data(session = 'Session_01', **args, seed = 123)
session2 = virtual_data(session = 'Session_02', **args, seed = 1234)
session3 = virtual_data(session = 'Session_03', **args, seed = 12345)
session4 = virtual_data(session = 'Session_04', **args, seed = 123456)

D = D47data(session1 + session2 + session3 + session4)

D.crunch()
D.standardize()

D.table_of_sessions(verbose = True, save_to_file = False)
D.table_of_samples(verbose = True, save_to_file = False)
D.table_of_analyses(verbose = True, save_to_file = False)

This should output something like:

[table_of_sessions] 
––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––
Session     Na  Nu  d13Cwg_VPDB  d18Owg_VSMOW  r_d13C  r_d18O   r_D47         a ± SE   1e3 x b ± SE          c ± SE
––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––
Session_01   9   6       -4.000        26.000  0.0205  0.0633  0.0091  1.015 ± 0.015  0.427 ± 0.232  -0.909 ± 0.006
Session_02   9   6       -4.000        26.000  0.0210  0.0882  0.0100  0.990 ± 0.015  0.484 ± 0.232  -0.905 ± 0.006
Session_03   9   6       -4.000        26.000  0.0186  0.0505  0.0111  0.997 ± 0.015  0.167 ± 0.233  -0.901 ± 0.006
Session_04   9   6       -4.000        26.000  0.0192  0.0467  0.0086  1.017 ± 0.015  0.229 ± 0.232  -0.910 ± 0.006
––––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––

[table_of_samples] 
––––––  ––  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
Sample   N  d13C_VPDB  d18O_VSMOW     D47      SE    95% CL      SD  p_Levene
––––––  ––  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
ETH-1   12       2.02       37.01  0.2052                    0.0083          
ETH-2   12     -10.17       19.88  0.2085                    0.0090          
ETH-3   12       1.71       37.46  0.6132                    0.0083          
BAR     12     -15.02       37.22  0.6057  0.0042  ± 0.0085  0.0088     0.753
FOO     12      -5.00       28.89  0.3024  0.0031  ± 0.0062  0.0070     0.497
––––––  ––  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––

[table_of_analyses] 
–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––
UID     Session  Sample  d13Cwg_VPDB  d18Owg_VSMOW        d45        d46         d47         d48         d49   d13C_VPDB  d18O_VSMOW     D47raw     D48raw     D49raw       D47
–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––
1    Session_01   ETH-1       -4.000        26.000   6.049381  10.706856   16.135579   21.196941   27.780042    2.057827   36.937067  -0.685751  -0.324384   0.045870  0.212791
2    Session_01   ETH-3       -4.000        26.000   5.755174  11.255104   16.792797   22.451660   28.306614    1.723596   37.497816  -0.270825  -0.181089  -0.195908  0.621458
3    Session_01   ETH-2       -4.000        26.000  -5.982229  -6.110437  -12.827036  -12.492272  -18.023381  -10.166188   19.784916  -0.693555  -0.312598   0.251040  0.217274
4    Session_01   ETH-1       -4.000        26.000   5.995601  10.755323   16.116087   21.285428   27.780042    1.998631   36.986704  -0.696924  -0.333640   0.008600  0.201787
5    Session_01     BAR       -4.000        26.000  -9.920507  10.903408    0.065076   21.704075   10.707292  -14.998270   37.174839  -0.307018  -0.216978  -0.026076  0.592818
6    Session_01     FOO       -4.000        26.000  -0.876454   2.906764    1.341194    5.490264    4.665655   -5.048760   28.984806  -0.608593  -0.329808  -0.114437  0.295055
7    Session_01     FOO       -4.000        26.000  -0.838118   2.819853    1.310384    5.326005    4.665655   -5.004629   28.895933  -0.593755  -0.319861   0.014956  0.309692
8    Session_01   ETH-2       -4.000        26.000  -5.974124  -5.955517  -12.668784  -12.208184  -18.023381  -10.163274   19.943159  -0.694902  -0.336672  -0.063946  0.215880
9    Session_01   ETH-3       -4.000        26.000   5.727341  11.211663   16.713472   22.364770   28.306614    1.695479   37.453503  -0.278056  -0.180158  -0.082015  0.614365
10   Session_01     FOO       -4.000        26.000  -0.848028   2.874679    1.346196    5.439150    4.665655   -5.017230   28.951964  -0.601502  -0.316664  -0.081898  0.302042
11   Session_01     BAR       -4.000        26.000  -9.959983  10.926995    0.053806   21.724901   10.707292  -15.041279   37.199026  -0.300066  -0.243252  -0.029371  0.599675
12   Session_01     BAR       -4.000        26.000  -9.915975  10.968470    0.153453   21.749385   10.707292  -14.995822   37.241294  -0.286638  -0.301325  -0.157376  0.612868
13   Session_01   ETH-3       -4.000        26.000   5.734896  11.229855   16.740410   22.402091   28.306614    1.702875   37.472070  -0.276998  -0.179635  -0.125368  0.615396
14   Session_01   ETH-2       -4.000        26.000  -5.991278  -5.995054  -12.741562  -12.184075  -18.023381  -10.180122   19.902809  -0.711697  -0.232746   0.032602  0.199357
15   Session_01   ETH-1       -4.000        26.000   6.010276  10.840276   16.207960   21.475150   27.780042    2.011176   37.073454  -0.704188  -0.315986  -0.172089  0.194589
16   Session_02   ETH-3       -4.000        26.000   5.757137  11.232751   16.744567   22.398244   28.306614    1.731295   37.514660  -0.298533  -0.189123  -0.154557  0.604363
17   Session_02   ETH-1       -4.000        26.000   5.993918  10.617469   15.991900   21.070358   27.780042    2.006934   36.882679  -0.683329  -0.271476   0.278458  0.216152
18   Session_02   ETH-3       -4.000        26.000   5.719281  11.207303   16.681693   22.370886   28.306614    1.691780   37.488633  -0.296801  -0.165556  -0.065004  0.606143
19   Session_02   ETH-3       -4.000        26.000   5.716356  11.091821   16.582487   22.123857   28.306614    1.692901   37.370126  -0.279100  -0.178789   0.162540  0.624067
20   Session_02   ETH-1       -4.000        26.000   6.030532  10.851030   16.245571   21.457100   27.780042    2.037466   37.122284  -0.698413  -0.354920  -0.214443  0.200795
21   Session_02     BAR       -4.000        26.000  -9.963888  10.865863   -0.023549   21.615868   10.707292  -15.053743   37.174715  -0.313906  -0.229031   0.093637  0.597041
22   Session_02     FOO       -4.000        26.000  -0.819742   2.826793    1.317044    5.330616    4.665655   -4.986618   28.903335  -0.612871  -0.329113  -0.018244  0.294481
23   Session_02   ETH-1       -4.000        26.000   6.019963  10.773112   16.163825   21.331060   27.780042    2.029040   37.042346  -0.692234  -0.324161  -0.051788  0.207075
24   Session_02   ETH-2       -4.000        26.000  -5.982371  -6.036210  -12.762399  -12.309944  -18.023381  -10.175178   19.819614  -0.701348  -0.277354   0.104418  0.212021
25   Session_02     FOO       -4.000        26.000  -0.835046   2.870518    1.355370    5.487896    4.665655   -5.004585   28.948243  -0.601666  -0.259900  -0.087592  0.305777
26   Session_02   ETH-2       -4.000        26.000  -5.950370  -5.959974  -12.650784  -12.197864  -18.023381  -10.143809   19.897777  -0.696916  -0.317263  -0.080604  0.216441
27   Session_02     BAR       -4.000        26.000  -9.936020  10.862339    0.024660   21.563307   10.707292  -15.023836   37.171034  -0.291333  -0.273498   0.070452  0.619812
28   Session_02     FOO       -4.000        26.000  -0.848415   2.849823    1.308081    5.427767    4.665655   -5.018107   28.927036  -0.614791  -0.278426  -0.032784  0.292547
29   Session_02     BAR       -4.000        26.000  -9.957566  10.903888    0.031785   21.739434   10.707292  -15.048386   37.213724  -0.302139  -0.183327   0.012926  0.608897
30   Session_02   ETH-2       -4.000        26.000  -5.993476  -5.944866  -12.696865  -12.149754  -18.023381  -10.190430   19.913381  -0.713779  -0.298963  -0.064251  0.199436
31   Session_03     FOO       -4.000        26.000  -0.800284   2.851299    1.376828    5.379547    4.665655   -4.951581   28.910199  -0.597293  -0.329315  -0.087015  0.304784
32   Session_03   ETH-3       -4.000        26.000   5.753467  11.206589   16.719131   22.373244   28.306614    1.723960   37.511190  -0.294350  -0.161838  -0.099835  0.606103
33   Session_03   ETH-2       -4.000        26.000  -5.997147  -5.905858  -12.655382  -12.081612  -18.023381  -10.165400   19.891551  -0.706536  -0.308464  -0.137414  0.197550
34   Session_03     FOO       -4.000        26.000  -0.873798   2.820799    1.272165    5.370745    4.665655   -5.028782   28.878917  -0.596008  -0.277258   0.051165  0.306090
35   Session_03     BAR       -4.000        26.000  -9.928709  10.989665    0.148059   21.852677   10.707292  -14.976237   37.324152  -0.299358  -0.242185  -0.184835  0.603855
36   Session_03   ETH-2       -4.000        26.000  -6.000290  -5.947172  -12.697463  -12.164602  -18.023381  -10.167221   19.848953  -0.705037  -0.309350  -0.052386  0.199061
37   Session_03   ETH-2       -4.000        26.000  -6.008525  -5.909707  -12.647727  -12.075913  -18.023381  -10.177379   19.887608  -0.683183  -0.294956  -0.117608  0.220975
38   Session_03   ETH-3       -4.000        26.000   5.748546  11.079879   16.580826   22.120063   28.306614    1.723364   37.380534  -0.302133  -0.158882   0.151641  0.598318
39   Session_03     FOO       -4.000        26.000  -0.823857   2.761300    1.258060    5.239992    4.665655   -4.973383   28.817444  -0.603327  -0.288652   0.114488  0.298751
40   Session_03   ETH-1       -4.000        26.000   5.994622  10.743980   16.116098   21.243734   27.780042    1.997857   37.033567  -0.684883  -0.352014   0.031692  0.214449
41   Session_03   ETH-3       -4.000        26.000   5.718991  11.146227   16.640814   22.243185   28.306614    1.689442   37.449023  -0.277332  -0.169668   0.053997  0.623187
42   Session_03   ETH-1       -4.000        26.000   6.040566  10.786620   16.205283   21.374963   27.780042    2.045244   37.077432  -0.685706  -0.307909  -0.099869  0.213609
43   Session_03     BAR       -4.000        26.000  -9.952115  11.034508    0.169809   21.885915   10.707292  -15.002819   37.370451  -0.296804  -0.298351  -0.246731  0.606414
44   Session_03   ETH-1       -4.000        26.000   6.004078  10.683951   16.045192   21.214355   27.780042    2.010134   36.971642  -0.705956  -0.262026   0.138399  0.193323
45   Session_03     BAR       -4.000        26.000  -9.957114  10.898997    0.044946   21.602296   10.707292  -15.003175   37.230716  -0.284699  -0.307849   0.021944  0.618578
46   Session_04   ETH-2       -4.000        26.000  -5.966627  -5.893789  -12.597717  -12.120719  -18.023381  -10.161842   19.911776  -0.691757  -0.372308  -0.193986  0.217132
47   Session_04   ETH-3       -4.000        26.000   5.751908  11.207110   16.726741   22.380392   28.306614    1.705481   37.480657  -0.285776  -0.155878  -0.099197  0.609567
48   Session_04     BAR       -4.000        26.000  -9.951025  10.951923    0.089386   21.738926   10.707292  -15.031949   37.254709  -0.298065  -0.278834  -0.087463  0.601230
49   Session_04     FOO       -4.000        26.000  -0.848192   2.777763    1.251297    5.280272    4.665655   -5.023358   28.822585  -0.601094  -0.281419   0.108186  0.303128
50   Session_04   ETH-1       -4.000        26.000   6.017312  10.735930   16.123043   21.270597   27.780042    2.005824   36.995214  -0.693479  -0.309795   0.023309  0.208980
51   Session_04   ETH-2       -4.000        26.000  -5.973623  -5.975018  -12.694278  -12.194472  -18.023381  -10.166297   19.828211  -0.701951  -0.283570  -0.025935  0.207135
52   Session_04     BAR       -4.000        26.000  -9.931741  10.819830   -0.023748   21.529372   10.707292  -15.006533   37.118743  -0.302866  -0.222623   0.148462  0.596536
53   Session_04   ETH-1       -4.000        26.000   6.023822  10.730714   16.121184   21.235757   27.780042    2.012958   36.989833  -0.696908  -0.333582   0.026555  0.205610
54   Session_04     FOO       -4.000        26.000  -0.791191   2.708220    1.256167    5.145784    4.665655   -4.960004   28.750896  -0.586913  -0.276505   0.183674  0.317065
55   Session_04     FOO       -4.000        26.000  -0.853969   2.805035    1.267571    5.353907    4.665655   -5.030523   28.850660  -0.605611  -0.262571   0.060903  0.298685
56   Session_04   ETH-2       -4.000        26.000  -5.986501  -5.915157  -12.656583  -12.060382  -18.023381  -10.182247   19.889836  -0.709603  -0.268277  -0.130450  0.199604
57   Session_04   ETH-3       -4.000        26.000   5.739420  11.128582   16.641344   22.166106   28.306614    1.695046   37.399884  -0.280608  -0.210162   0.066645  0.614665
58   Session_04     BAR       -4.000        26.000  -9.926078  10.884823    0.060864   21.650722   10.707292  -15.002880   37.185606  -0.287358  -0.232425   0.016044  0.611760
59   Session_04   ETH-1       -4.000        26.000   6.029937  10.766997   16.151273   21.345479   27.780042    2.018148   37.027152  -0.708855  -0.297953  -0.050465  0.193862
60   Session_04   ETH-3       -4.000        26.000   5.798016  11.254135   16.832228   22.432473   28.306614    1.752928   37.528936  -0.275047  -0.197935  -0.239408  0.620088
–––  ––––––––––  ––––––  –––––––––––  ––––––––––––  –––––––––  –––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  ––––––––––  –––––––––  –––––––––  –––––––––  ––––––––


def table_of_samples( data47=None, data48=None, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
543def table_of_samples(
544	data47 = None,
545	data48 = None,
546	dir = 'output',
547	filename = None,
548	save_to_file = True,
549	print_out = True,
550	output = None,
551	):
552	'''
553	Print out, save to disk and/or return a combined table of samples
554	for a pair of `D47data` and `D48data` objects.
555
556	**Parameters**
557
558	+ `data47`: `D47data` instance
559	+ `data48`: `D48data` instance
560	+ `dir`: the directory in which to save the table
561	+ `filename`: the name to the csv file to write to
562	+ `save_to_file`: whether to save the table to disk
563	+ `print_out`: whether to print out the table
564	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
565		if set to `'raw'`: return a list of list of strings
566		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
567	'''
568	if data47 is None:
569		if data48 is None:
570			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
571		else:
572			return data48.table_of_samples(
573				dir = dir,
574				filename = filename,
575				save_to_file = save_to_file,
576				print_out = print_out,
577				output = output
578				)
579	else:
580		if data48 is None:
581			return data47.table_of_samples(
582				dir = dir,
583				filename = filename,
584				save_to_file = save_to_file,
585				print_out = print_out,
586				output = output
587				)
588		else:
589			out47 = data47.table_of_samples(save_to_file = False, print_out = False, output = 'raw')
590			out48 = data48.table_of_samples(save_to_file = False, print_out = False, output = 'raw')
591			out = transpose_table(transpose_table(out47) + transpose_table(out48)[4:])
592
593			if save_to_file:
594				if not os.path.exists(dir):
595					os.makedirs(dir)
596				if filename is None:
597					filename = f'D47D48_samples.csv'
598				with open(f'{dir}/{filename}', 'w') as fid:
599					fid.write(make_csv(out))
600			if print_out:
601				print('\n'+pretty_table(out))
602			if output == 'raw':
603				return out
604			elif output == 'pretty':
605				return pretty_table(out)

Print out, save to disk and/or return a combined table of samples for a pair of D47data and D48data objects.

Parameters

  • data47: D47data instance
  • data48: D48data instance
  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
def table_of_sessions( data47=None, data48=None, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
608def table_of_sessions(
609	data47 = None,
610	data48 = None,
611	dir = 'output',
612	filename = None,
613	save_to_file = True,
614	print_out = True,
615	output = None,
616	):
617	'''
618	Print out, save to disk and/or return a combined table of sessions
619	for a pair of `D47data` and `D48data` objects.
620	***Only applicable if the sessions in `data47` and those in `data48`
621	consist of the exact same sets of analyses.***
622
623	**Parameters**
624
625	+ `data47`: `D47data` instance
626	+ `data48`: `D48data` instance
627	+ `dir`: the directory in which to save the table
628	+ `filename`: the name to the csv file to write to
629	+ `save_to_file`: whether to save the table to disk
630	+ `print_out`: whether to print out the table
631	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
632		if set to `'raw'`: return a list of list of strings
633		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
634	'''
635	if data47 is None:
636		if data48 is None:
637			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
638		else:
639			return data48.table_of_sessions(
640				dir = dir,
641				filename = filename,
642				save_to_file = save_to_file,
643				print_out = print_out,
644				output = output
645				)
646	else:
647		if data48 is None:
648			return data47.table_of_sessions(
649				dir = dir,
650				filename = filename,
651				save_to_file = save_to_file,
652				print_out = print_out,
653				output = output
654				)
655		else:
656			out47 = data47.table_of_sessions(save_to_file = False, print_out = False, output = 'raw')
657			out48 = data48.table_of_sessions(save_to_file = False, print_out = False, output = 'raw')
658			for k,x in enumerate(out47[0]):
659				if k>7:
660					out47[0][k] = out47[0][k].replace('a', 'a_47').replace('b', 'b_47').replace('c', 'c_47')
661					out48[0][k] = out48[0][k].replace('a', 'a_48').replace('b', 'b_48').replace('c', 'c_48')
662			out = transpose_table(transpose_table(out47) + transpose_table(out48)[7:])
663
664			if save_to_file:
665				if not os.path.exists(dir):
666					os.makedirs(dir)
667				if filename is None:
668					filename = f'D47D48_sessions.csv'
669				with open(f'{dir}/{filename}', 'w') as fid:
670					fid.write(make_csv(out))
671			if print_out:
672				print('\n'+pretty_table(out))
673			if output == 'raw':
674				return out
675			elif output == 'pretty':
676				return pretty_table(out)

Print out, save to disk and/or return a combined table of sessions for a pair of D47data and D48data objects. Only applicable if the sessions in data47 and those in data48 consist of the exact same sets of analyses.

Parameters

  • data47: D47data instance
  • data48: D48data instance
  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
def table_of_analyses( data47=None, data48=None, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
679def table_of_analyses(
680	data47 = None,
681	data48 = None,
682	dir = 'output',
683	filename = None,
684	save_to_file = True,
685	print_out = True,
686	output = None,
687	):
688	'''
689	Print out, save to disk and/or return a combined table of analyses
690	for a pair of `D47data` and `D48data` objects.
691
692	If the sessions in `data47` and those in `data48` do not consist of
693	the exact same sets of analyses, the table will have two columns
694	`Session_47` and `Session_48` instead of a single `Session` column.
695
696	**Parameters**
697
698	+ `data47`: `D47data` instance
699	+ `data48`: `D48data` instance
700	+ `dir`: the directory in which to save the table
701	+ `filename`: the name to the csv file to write to
702	+ `save_to_file`: whether to save the table to disk
703	+ `print_out`: whether to print out the table
704	+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
705		if set to `'raw'`: return a list of list of strings
706		(e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
707	'''
708	if data47 is None:
709		if data48 is None:
710			raise TypeError("Arguments must include at least one D47data() or D48data() instance.")
711		else:
712			return data48.table_of_analyses(
713				dir = dir,
714				filename = filename,
715				save_to_file = save_to_file,
716				print_out = print_out,
717				output = output
718				)
719	else:
720		if data48 is None:
721			return data47.table_of_analyses(
722				dir = dir,
723				filename = filename,
724				save_to_file = save_to_file,
725				print_out = print_out,
726				output = output
727				)
728		else:
729			out47 = data47.table_of_analyses(save_to_file = False, print_out = False, output = 'raw')
730			out48 = data48.table_of_analyses(save_to_file = False, print_out = False, output = 'raw')
731			
732			if [l[1] for l in out47[1:]] == [l[1] for l in out48[1:]]: # if sessions are identical
733				out = transpose_table(transpose_table(out47) + transpose_table(out48)[-1:])
734			else:
735				out47[0][1] = 'Session_47'
736				out48[0][1] = 'Session_48'
737				out47 = transpose_table(out47)
738				out48 = transpose_table(out48)
739				out = transpose_table(out47[:2] + out48[1:2] + out47[2:] + out48[-1:])
740
741			if save_to_file:
742				if not os.path.exists(dir):
743					os.makedirs(dir)
744				if filename is None:
745					filename = f'D47D48_sessions.csv'
746				with open(f'{dir}/{filename}', 'w') as fid:
747					fid.write(make_csv(out))
748			if print_out:
749				print('\n'+pretty_table(out))
750			if output == 'raw':
751				return out
752			elif output == 'pretty':
753				return pretty_table(out)

Print out, save to disk and/or return a combined table of analyses for a pair of D47data and D48data objects.

If the sessions in data47 and those in data48 do not consist of the exact same sets of analyses, the table will have two columns Session_47 and Session_48 instead of a single Session column.

Parameters

  • data47: D47data instance
  • data48: D48data instance
  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
class D4xdata(builtins.list):
 801class D4xdata(list):
 802	'''
 803	Store and process data for a large set of Δ47 and/or Δ48
 804	analyses, usually comprising more than one analytical session.
 805	'''
 806
 807	### 17O CORRECTION PARAMETERS
 808	R13_VPDB = 0.01118  # (Chang & Li, 1990)
 809	'''
 810	Absolute (13C/12C) ratio of VPDB.
 811	By default equal to 0.01118 ([Chang & Li, 1990](http://www.cnki.com.cn/Article/CJFDTotal-JXTW199004006.htm))
 812	'''
 813
 814	R18_VSMOW = 0.0020052  # (Baertschi, 1976)
 815	'''
 816	Absolute (18O/16C) ratio of VSMOW.
 817	By default equal to 0.0020052 ([Baertschi, 1976](https://doi.org/10.1016/0012-821X(76)90115-1))
 818	'''
 819
 820	LAMBDA_17 = 0.528  # (Barkan & Luz, 2005)
 821	'''
 822	Mass-dependent exponent for triple oxygen isotopes.
 823	By default equal to 0.528 ([Barkan & Luz, 2005](https://doi.org/10.1002/rcm.2250))
 824	'''
 825
 826	R17_VSMOW = 0.00038475  # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB)
 827	'''
 828	Absolute (17O/16C) ratio of VSMOW.
 829	By default equal to 0.00038475
 830	([Assonov & Brenninkmeijer, 2003](https://dx.doi.org/10.1002/rcm.1011),
 831	rescaled to `R13_VPDB`)
 832	'''
 833
 834	R18_VPDB = R18_VSMOW * 1.03092
 835	'''
 836	Absolute (18O/16C) ratio of VPDB.
 837	By definition equal to `R18_VSMOW * 1.03092`.
 838	'''
 839
 840	R17_VPDB = R17_VSMOW * 1.03092 ** LAMBDA_17
 841	'''
 842	Absolute (17O/16C) ratio of VPDB.
 843	By definition equal to `R17_VSMOW * 1.03092 ** LAMBDA_17`.
 844	'''
 845
 846	LEVENE_REF_SAMPLE = 'ETH-3'
 847	'''
 848	After the Δ4x standardization step, each sample is tested to
 849	assess whether the Δ4x variance within all analyses for that
 850	sample differs significantly from that observed for a given reference
 851	sample (using [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test),
 852	which yields a p-value corresponding to the null hypothesis that the
 853	underlying variances are equal).
 854
 855	`LEVENE_REF_SAMPLE` (by default equal to `'ETH-3'`) specifies which
 856	sample should be used as a reference for this test.
 857	'''
 858
 859	ALPHA_18O_ACID_REACTION = round(np.exp(3.59 / (90 + 273.15) - 1.79e-3), 6)  # (Kim et al., 2007, calcite)
 860	'''
 861	Specifies the 18O/16O fractionation factor generally applicable
 862	to acid reactions in the dataset. Currently used by `D4xdata.wg()`,
 863	`D4xdata.standardize_d13C`, and `D4xdata.standardize_d18O`.
 864
 865	By default equal to 1.008129 (calcite reacted at 90 °C,
 866	[Kim et al., 2007](https://dx.doi.org/10.1016/j.chemgeo.2007.08.005)).
 867	'''
 868
 869	Nominal_d13C_VPDB = {
 870		'ETH-1': 2.02,
 871		'ETH-2': -10.17,
 872		'ETH-3': 1.71,
 873		}	# (Bernasconi et al., 2018)
 874	'''
 875	Nominal δ13C_VPDB values assigned to carbonate standards, used by
 876	`D4xdata.standardize_d13C()`.
 877
 878	By default equal to `{'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}` after
 879	[Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385).
 880	'''
 881
 882	Nominal_d18O_VPDB = {
 883		'ETH-1': -2.19,
 884		'ETH-2': -18.69,
 885		'ETH-3': -1.78,
 886		}	# (Bernasconi et al., 2018)
 887	'''
 888	Nominal δ18O_VPDB values assigned to carbonate standards, used by
 889	`D4xdata.standardize_d18O()`.
 890
 891	By default equal to `{'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}` after
 892	[Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385).
 893	'''
 894
 895	d13C_STANDARDIZATION_METHOD = '2pt'
 896	'''
 897	Method by which to standardize δ13C values:
 898	
 899	+ `none`: do not apply any δ13C standardization.
 900	+ `'1pt'`: within each session, offset all initial δ13C values so as to
 901	minimize the difference between final δ13C_VPDB values and
 902	`Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` is defined).
 903	+ `'2pt'`: within each session, apply a affine trasformation to all δ13C
 904	values so as to minimize the difference between final δ13C_VPDB
 905	values and `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB`
 906	is defined).
 907	'''
 908
 909	d18O_STANDARDIZATION_METHOD = '2pt'
 910	'''
 911	Method by which to standardize δ18O values:
 912	
 913	+ `none`: do not apply any δ18O standardization.
 914	+ `'1pt'`: within each session, offset all initial δ18O values so as to
 915	minimize the difference between final δ18O_VPDB values and
 916	`Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` is defined).
 917	+ `'2pt'`: within each session, apply a affine trasformation to all δ18O
 918	values so as to minimize the difference between final δ18O_VPDB
 919	values and `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB`
 920	is defined).
 921	'''
 922
 923	def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False):
 924		'''
 925		**Parameters**
 926
 927		+ `l`: a list of dictionaries, with each dictionary including at least the keys
 928		`Sample`, `d45`, `d46`, and `d47` or `d48`.
 929		+ `mass`: `'47'` or `'48'`
 930		+ `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods.
 931		+ `session`: define session name for analyses without a `Session` key
 932		+ `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods.
 933
 934		Returns a `D4xdata` object derived from `list`.
 935		'''
 936		self._4x = mass
 937		self.verbose = verbose
 938		self.prefix = 'D4xdata'
 939		self.logfile = logfile
 940		list.__init__(self, l)
 941		self.Nf = None
 942		self.repeatability = {}
 943		self.refresh(session = session)
 944
 945
 946	def make_verbal(oldfun):
 947		'''
 948		Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`.
 949		'''
 950		@wraps(oldfun)
 951		def newfun(*args, verbose = '', **kwargs):
 952			myself = args[0]
 953			oldprefix = myself.prefix
 954			myself.prefix = oldfun.__name__
 955			if verbose != '':
 956				oldverbose = myself.verbose
 957				myself.verbose = verbose
 958			out = oldfun(*args, **kwargs)
 959			myself.prefix = oldprefix
 960			if verbose != '':
 961				myself.verbose = oldverbose
 962			return out
 963		return newfun
 964
 965
 966	def msg(self, txt):
 967		'''
 968		Log a message to `self.logfile`, and print it out if `verbose = True`
 969		'''
 970		self.log(txt)
 971		if self.verbose:
 972			print(f'{f"[{self.prefix}]":<16} {txt}')
 973
 974
 975	def vmsg(self, txt):
 976		'''
 977		Log a message to `self.logfile` and print it out
 978		'''
 979		self.log(txt)
 980		print(txt)
 981
 982
 983	def log(self, *txts):
 984		'''
 985		Log a message to `self.logfile`
 986		'''
 987		if self.logfile:
 988			with open(self.logfile, 'a') as fid:
 989				for txt in txts:
 990					fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}')
 991
 992
 993	def refresh(self, session = 'mySession'):
 994		'''
 995		Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`.
 996		'''
 997		self.fill_in_missing_info(session = session)
 998		self.refresh_sessions()
 999		self.refresh_samples()
1000
1001
1002	def refresh_sessions(self):
1003		'''
1004		Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift`
1005		to `False` for all sessions.
1006		'''
1007		self.sessions = {
1008			s: {'data': [r for r in self if r['Session'] == s]}
1009			for s in sorted({r['Session'] for r in self})
1010			}
1011		for s in self.sessions:
1012			self.sessions[s]['scrambling_drift'] = False
1013			self.sessions[s]['slope_drift'] = False
1014			self.sessions[s]['wg_drift'] = False
1015			self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD
1016			self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD
1017
1018
1019	def refresh_samples(self):
1020		'''
1021		Define `self.samples`, `self.anchors`, and `self.unknowns`.
1022		'''
1023		self.samples = {
1024			s: {'data': [r for r in self if r['Sample'] == s]}
1025			for s in sorted({r['Sample'] for r in self})
1026			}
1027		self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x}
1028		self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x}
1029
1030
1031	def read(self, filename, sep = '', session = ''):
1032		'''
1033		Read file in csv format to load data into a `D47data` object.
1034
1035		In the csv file, spaces before and after field separators (`','` by default)
1036		are optional. Each line corresponds to a single analysis.
1037
1038		The required fields are:
1039
1040		+ `UID`: a unique identifier
1041		+ `Session`: an identifier for the analytical session
1042		+ `Sample`: a sample identifier
1043		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1044
1045		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1046		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1047		and `d49` are optional, and set to NaN by default.
1048
1049		**Parameters**
1050
1051		+ `fileneme`: the path of the file to read
1052		+ `sep`: csv separator delimiting the fields
1053		+ `session`: set `Session` field to this string for all analyses
1054		'''
1055		with open(filename) as fid:
1056			self.input(fid.read(), sep = sep, session = session)
1057
1058
1059	def input(self, txt, sep = '', session = ''):
1060		'''
1061		Read `txt` string in csv format to load analysis data into a `D47data` object.
1062
1063		In the csv string, spaces before and after field separators (`','` by default)
1064		are optional. Each line corresponds to a single analysis.
1065
1066		The required fields are:
1067
1068		+ `UID`: a unique identifier
1069		+ `Session`: an identifier for the analytical session
1070		+ `Sample`: a sample identifier
1071		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1072
1073		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1074		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1075		and `d49` are optional, and set to NaN by default.
1076
1077		**Parameters**
1078
1079		+ `txt`: the csv string to read
1080		+ `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`,
1081		whichever appers most often in `txt`.
1082		+ `session`: set `Session` field to this string for all analyses
1083		'''
1084		if sep == '':
1085			sep = sorted(',;\t', key = lambda x: - txt.count(x))[0]
1086		txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()]
1087		data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]]
1088
1089		if session != '':
1090			for r in data:
1091				r['Session'] = session
1092
1093		self += data
1094		self.refresh()
1095
1096
1097	@make_verbal
1098	def wg(self, samples = None, a18_acid = None):
1099		'''
1100		Compute bulk composition of the working gas for each session based on
1101		the carbonate standards defined in both `self.Nominal_d13C_VPDB` and
1102		`self.Nominal_d18O_VPDB`.
1103		'''
1104
1105		self.msg('Computing WG composition:')
1106
1107		if a18_acid is None:
1108			a18_acid = self.ALPHA_18O_ACID_REACTION
1109		if samples is None:
1110			samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB]
1111
1112		assert a18_acid, f'Acid fractionation factor should not be zero.'
1113
1114		samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB]
1115		R45R46_standards = {}
1116		for sample in samples:
1117			d13C_vpdb = self.Nominal_d13C_VPDB[sample]
1118			d18O_vpdb = self.Nominal_d18O_VPDB[sample]
1119			R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000)
1120			R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17
1121			R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid
1122
1123			C12_s = 1 / (1 + R13_s)
1124			C13_s = R13_s / (1 + R13_s)
1125			C16_s = 1 / (1 + R17_s + R18_s)
1126			C17_s = R17_s / (1 + R17_s + R18_s)
1127			C18_s = R18_s / (1 + R17_s + R18_s)
1128
1129			C626_s = C12_s * C16_s ** 2
1130			C627_s = 2 * C12_s * C16_s * C17_s
1131			C628_s = 2 * C12_s * C16_s * C18_s
1132			C636_s = C13_s * C16_s ** 2
1133			C637_s = 2 * C13_s * C16_s * C17_s
1134			C727_s = C12_s * C17_s ** 2
1135
1136			R45_s = (C627_s + C636_s) / C626_s
1137			R46_s = (C628_s + C637_s + C727_s) / C626_s
1138			R45R46_standards[sample] = (R45_s, R46_s)
1139		
1140		for s in self.sessions:
1141			db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples]
1142			assert db, f'No sample from {samples} found in session "{s}".'
1143# 			dbsamples = sorted({r['Sample'] for r in db})
1144
1145			X = [r['d45'] for r in db]
1146			Y = [R45R46_standards[r['Sample']][0] for r in db]
1147			x1, x2 = np.min(X), np.max(X)
1148
1149			if x1 < x2:
1150				wgcoord = x1/(x1-x2)
1151			else:
1152				wgcoord = 999
1153
1154			if wgcoord < -.5 or wgcoord > 1.5:
1155				# unreasonable to extrapolate to d45 = 0
1156				R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1157			else :
1158				# d45 = 0 is reasonably well bracketed
1159				R45_wg = np.polyfit(X, Y, 1)[1]
1160
1161			X = [r['d46'] for r in db]
1162			Y = [R45R46_standards[r['Sample']][1] for r in db]
1163			x1, x2 = np.min(X), np.max(X)
1164
1165			if x1 < x2:
1166				wgcoord = x1/(x1-x2)
1167			else:
1168				wgcoord = 999
1169
1170			if wgcoord < -.5 or wgcoord > 1.5:
1171				# unreasonable to extrapolate to d46 = 0
1172				R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1173			else :
1174				# d46 = 0 is reasonably well bracketed
1175				R46_wg = np.polyfit(X, Y, 1)[1]
1176
1177			d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg)
1178
1179			self.msg(f'Session {s} WG:   δ13C_VPDB = {d13Cwg_VPDB:.3f}   δ18O_VSMOW = {d18Owg_VSMOW:.3f}')
1180
1181			self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB
1182			self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW
1183			for r in self.sessions[s]['data']:
1184				r['d13Cwg_VPDB'] = d13Cwg_VPDB
1185				r['d18Owg_VSMOW'] = d18Owg_VSMOW
1186
1187
1188	def compute_bulk_delta(self, R45, R46, D17O = 0):
1189		'''
1190		Compute δ13C_VPDB and δ18O_VSMOW,
1191		by solving the generalized form of equation (17) from
1192		[Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05),
1193		assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and
1194		solving the corresponding second-order Taylor polynomial.
1195		(Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014))
1196		'''
1197
1198		K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17
1199
1200		A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17)
1201		B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17
1202		C = 2 * self.R18_VSMOW
1203		D = -R46
1204
1205		aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2
1206		bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C
1207		cc = A + B + C + D
1208
1209		d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa)
1210
1211		R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW
1212		R17 = K * R18 ** self.LAMBDA_17
1213		R13 = R45 - 2 * R17
1214
1215		d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1)
1216
1217		return d13C_VPDB, d18O_VSMOW
1218
1219
1220	@make_verbal
1221	def crunch(self, verbose = ''):
1222		'''
1223		Compute bulk composition and raw clumped isotope anomalies for all analyses.
1224		'''
1225		for r in self:
1226			self.compute_bulk_and_clumping_deltas(r)
1227		self.standardize_d13C()
1228		self.standardize_d18O()
1229		self.msg(f"Crunched {len(self)} analyses.")
1230
1231
1232	def fill_in_missing_info(self, session = 'mySession'):
1233		'''
1234		Fill in optional fields with default values
1235		'''
1236		for i,r in enumerate(self):
1237			if 'D17O' not in r:
1238				r['D17O'] = 0.
1239			if 'UID' not in r:
1240				r['UID'] = f'{i+1}'
1241			if 'Session' not in r:
1242				r['Session'] = session
1243			for k in ['d47', 'd48', 'd49']:
1244				if k not in r:
1245					r[k] = np.nan
1246
1247
1248	def standardize_d13C(self):
1249		'''
1250		Perform δ13C standadization within each session `s` according to
1251		`self.sessions[s]['d13C_standardization_method']`, which is defined by default
1252		by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but
1253		may be redefined abitrarily at a later stage.
1254		'''
1255		for s in self.sessions:
1256			if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']:
1257				XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB]
1258				X,Y = zip(*XY)
1259				if self.sessions[s]['d13C_standardization_method'] == '1pt':
1260					offset = np.mean(Y) - np.mean(X)
1261					for r in self.sessions[s]['data']:
1262						r['d13C_VPDB'] += offset				
1263				elif self.sessions[s]['d13C_standardization_method'] == '2pt':
1264					a,b = np.polyfit(X,Y,1)
1265					for r in self.sessions[s]['data']:
1266						r['d13C_VPDB'] = a * r['d13C_VPDB'] + b
1267
1268	def standardize_d18O(self):
1269		'''
1270		Perform δ18O standadization within each session `s` according to
1271		`self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`,
1272		which is defined by default by `D47data.refresh_sessions()`as equal to
1273		`self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage.
1274		'''
1275		for s in self.sessions:
1276			if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']:
1277				XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB]
1278				X,Y = zip(*XY)
1279				Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y]
1280				if self.sessions[s]['d18O_standardization_method'] == '1pt':
1281					offset = np.mean(Y) - np.mean(X)
1282					for r in self.sessions[s]['data']:
1283						r['d18O_VSMOW'] += offset				
1284				elif self.sessions[s]['d18O_standardization_method'] == '2pt':
1285					a,b = np.polyfit(X,Y,1)
1286					for r in self.sessions[s]['data']:
1287						r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b
1288	
1289
1290	def compute_bulk_and_clumping_deltas(self, r):
1291		'''
1292		Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`.
1293		'''
1294
1295		# Compute working gas R13, R18, and isobar ratios
1296		R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000)
1297		R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000)
1298		R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg)
1299
1300		# Compute analyte isobar ratios
1301		R45 = (1 + r['d45'] / 1000) * R45_wg
1302		R46 = (1 + r['d46'] / 1000) * R46_wg
1303		R47 = (1 + r['d47'] / 1000) * R47_wg
1304		R48 = (1 + r['d48'] / 1000) * R48_wg
1305		R49 = (1 + r['d49'] / 1000) * R49_wg
1306
1307		r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O'])
1308		R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB
1309		R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW
1310
1311		# Compute stochastic isobar ratios of the analyte
1312		R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios(
1313			R13, R18, D17O = r['D17O']
1314		)
1315
1316		# Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1,
1317		# and raise a warning if the corresponding anomalies exceed 0.02 ppm.
1318		if (R45 / R45stoch - 1) > 5e-8:
1319			self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm')
1320		if (R46 / R46stoch - 1) > 5e-8:
1321			self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm')
1322
1323		# Compute raw clumped isotope anomalies
1324		r['D47raw'] = 1000 * (R47 / R47stoch - 1)
1325		r['D48raw'] = 1000 * (R48 / R48stoch - 1)
1326		r['D49raw'] = 1000 * (R49 / R49stoch - 1)
1327
1328
1329	def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0):
1330		'''
1331		Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`,
1332		optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope
1333		anomalies (`D47`, `D48`, `D49`), all expressed in permil.
1334		'''
1335
1336		# Compute R17
1337		R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17
1338
1339		# Compute isotope concentrations
1340		C12 = (1 + R13) ** -1
1341		C13 = C12 * R13
1342		C16 = (1 + R17 + R18) ** -1
1343		C17 = C16 * R17
1344		C18 = C16 * R18
1345
1346		# Compute stochastic isotopologue concentrations
1347		C626 = C16 * C12 * C16
1348		C627 = C16 * C12 * C17 * 2
1349		C628 = C16 * C12 * C18 * 2
1350		C636 = C16 * C13 * C16
1351		C637 = C16 * C13 * C17 * 2
1352		C638 = C16 * C13 * C18 * 2
1353		C727 = C17 * C12 * C17
1354		C728 = C17 * C12 * C18 * 2
1355		C737 = C17 * C13 * C17
1356		C738 = C17 * C13 * C18 * 2
1357		C828 = C18 * C12 * C18
1358		C838 = C18 * C13 * C18
1359
1360		# Compute stochastic isobar ratios
1361		R45 = (C636 + C627) / C626
1362		R46 = (C628 + C637 + C727) / C626
1363		R47 = (C638 + C728 + C737) / C626
1364		R48 = (C738 + C828) / C626
1365		R49 = C838 / C626
1366
1367		# Account for stochastic anomalies
1368		R47 *= 1 + D47 / 1000
1369		R48 *= 1 + D48 / 1000
1370		R49 *= 1 + D49 / 1000
1371
1372		# Return isobar ratios
1373		return R45, R46, R47, R48, R49
1374
1375
1376	def split_samples(self, samples_to_split = 'all', grouping = 'by_session'):
1377		'''
1378		Split unknown samples by UID (treat all analyses as different samples)
1379		or by session (treat analyses of a given sample in different sessions as
1380		different samples).
1381
1382		**Parameters**
1383
1384		+ `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']`
1385		+ `grouping`: `by_uid` | `by_session`
1386		'''
1387		if samples_to_split == 'all':
1388			samples_to_split = [s for s in self.unknowns]
1389		gkeys = {'by_uid':'UID', 'by_session':'Session'}
1390		self.grouping = grouping.lower()
1391		if self.grouping in gkeys:
1392			gkey = gkeys[self.grouping]
1393		for r in self:
1394			if r['Sample'] in samples_to_split:
1395				r['Sample_original'] = r['Sample']
1396				r['Sample'] = f"{r['Sample']}__{r[gkey]}"
1397			elif r['Sample'] in self.unknowns:
1398				r['Sample_original'] = r['Sample']
1399		self.refresh_samples()
1400
1401
1402	def unsplit_samples(self, tables = False):
1403		'''
1404		Reverse the effects of `D47data.split_samples()`.
1405		
1406		This should only be used after `D4xdata.standardize()` with `method='pooled'`.
1407		
1408		After `D4xdata.standardize()` with `method='indep_sessions'`, one should
1409		probably use `D4xdata.combine_samples()` instead to reverse the effects of
1410		`D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the
1411		effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in
1412		that case session-averaged Δ4x values are statistically independent).
1413		'''
1414		unknowns_old = sorted({s for s in self.unknowns})
1415		CM_old = self.standardization.covar[:,:]
1416		VD_old = self.standardization.params.valuesdict().copy()
1417		vars_old = self.standardization.var_names
1418
1419		unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r})
1420
1421		Ns = len(vars_old) - len(unknowns_old)
1422		vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new]
1423		VD_new = {k: VD_old[k] for k in vars_old[:Ns]}
1424
1425		W = np.zeros((len(vars_new), len(vars_old)))
1426		W[:Ns,:Ns] = np.eye(Ns)
1427		for u in unknowns_new:
1428			splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u})
1429			if self.grouping == 'by_session':
1430				weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits]
1431			elif self.grouping == 'by_uid':
1432				weights = [1 for s in splits]
1433			sw = sum(weights)
1434			weights = [w/sw for w in weights]
1435			W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:]
1436
1437		CM_new = W @ CM_old @ W.T
1438		V = W @ np.array([[VD_old[k]] for k in vars_old])
1439		VD_new = {k:v[0] for k,v in zip(vars_new, V)}
1440
1441		self.standardization.covar = CM_new
1442		self.standardization.params.valuesdict = lambda : VD_new
1443		self.standardization.var_names = vars_new
1444
1445		for r in self:
1446			if r['Sample'] in self.unknowns:
1447				r['Sample_split'] = r['Sample']
1448				r['Sample'] = r['Sample_original']
1449
1450		self.refresh_samples()
1451		self.consolidate_samples()
1452		self.repeatabilities()
1453
1454		if tables:
1455			self.table_of_analyses()
1456			self.table_of_samples()
1457
1458	def assign_timestamps(self):
1459		'''
1460		Assign a time field `t` of type `float` to each analysis.
1461
1462		If `TimeTag` is one of the data fields, `t` is equal within a given session
1463		to `TimeTag` minus the mean value of `TimeTag` for that session.
1464		Otherwise, `TimeTag` is by default equal to the index of each analysis
1465		in the dataset and `t` is defined as above.
1466		'''
1467		for session in self.sessions:
1468			sdata = self.sessions[session]['data']
1469			try:
1470				t0 = np.mean([r['TimeTag'] for r in sdata])
1471				for r in sdata:
1472					r['t'] = r['TimeTag'] - t0
1473			except KeyError:
1474				t0 = (len(sdata)-1)/2
1475				for t,r in enumerate(sdata):
1476					r['t'] = t - t0
1477
1478
1479	def report(self):
1480		'''
1481		Prints a report on the standardization fit.
1482		Only applicable after `D4xdata.standardize(method='pooled')`.
1483		'''
1484		report_fit(self.standardization)
1485
1486
1487	def combine_samples(self, sample_groups):
1488		'''
1489		Combine analyses of different samples to compute weighted average Δ4x
1490		and new error (co)variances corresponding to the groups defined by the `sample_groups`
1491		dictionary.
1492		
1493		Caution: samples are weighted by number of replicate analyses, which is a
1494		reasonable default behavior but is not always optimal (e.g., in the case of strongly
1495		correlated analytical errors for one or more samples).
1496		
1497		Returns a tuplet of:
1498		
1499		+ the list of group names
1500		+ an array of the corresponding Δ4x values
1501		+ the corresponding (co)variance matrix
1502		
1503		**Parameters**
1504
1505		+ `sample_groups`: a dictionary of the form:
1506		```py
1507		{'group1': ['sample_1', 'sample_2'],
1508		 'group2': ['sample_3', 'sample_4', 'sample_5']}
1509		```
1510		'''
1511		
1512		samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])]
1513		groups = sorted(sample_groups.keys())
1514		group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups}
1515		D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples])
1516		CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples])
1517		W = np.array([
1518			[self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples]
1519			for j in groups])
1520		D4x_new = W @ D4x_old
1521		CM_new = W @ CM_old @ W.T
1522
1523		return groups, D4x_new[:,0], CM_new
1524		
1525
1526	@make_verbal
1527	def standardize(self,
1528		method = 'pooled',
1529		weighted_sessions = [],
1530		consolidate = True,
1531		consolidate_tables = False,
1532		consolidate_plots = False,
1533		constraints = {},
1534		):
1535		'''
1536		Compute absolute Δ4x values for all replicate analyses and for sample averages.
1537		If `method` argument is set to `'pooled'`, the standardization processes all sessions
1538		in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous,
1539		i.e. that their true Δ4x value does not change between sessions,
1540		([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to
1541		`'indep_sessions'`, the standardization processes each session independently, based only
1542		on anchors analyses.
1543		'''
1544
1545		self.standardization_method = method
1546		self.assign_timestamps()
1547
1548		if method == 'pooled':
1549			if weighted_sessions:
1550				for session_group in weighted_sessions:
1551					if self._4x == '47':
1552						X = D47data([r for r in self if r['Session'] in session_group])
1553					elif self._4x == '48':
1554						X = D48data([r for r in self if r['Session'] in session_group])
1555					X.Nominal_D4x = self.Nominal_D4x.copy()
1556					X.refresh()
1557					result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False)
1558					w = np.sqrt(result.redchi)
1559					self.msg(f'Session group {session_group} MRSWD = {w:.4f}')
1560					for r in X:
1561						r[f'wD{self._4x}raw'] *= w
1562			else:
1563				self.msg(f'All D{self._4x}raw weights set to 1 ‰')
1564				for r in self:
1565					r[f'wD{self._4x}raw'] = 1.
1566
1567			params = Parameters()
1568			for k,session in enumerate(self.sessions):
1569				self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.")
1570				self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.")
1571				self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.")
1572				s = pf(session)
1573				params.add(f'a_{s}', value = 0.9)
1574				params.add(f'b_{s}', value = 0.)
1575				params.add(f'c_{s}', value = -0.9)
1576				params.add(f'a2_{s}', value = 0.,
1577# 					vary = self.sessions[session]['scrambling_drift'],
1578					)
1579				params.add(f'b2_{s}', value = 0.,
1580# 					vary = self.sessions[session]['slope_drift'],
1581					)
1582				params.add(f'c2_{s}', value = 0.,
1583# 					vary = self.sessions[session]['wg_drift'],
1584					)
1585				if not self.sessions[session]['scrambling_drift']:
1586					params[f'a2_{s}'].expr = '0'
1587				if not self.sessions[session]['slope_drift']:
1588					params[f'b2_{s}'].expr = '0'
1589				if not self.sessions[session]['wg_drift']:
1590					params[f'c2_{s}'].expr = '0'
1591
1592			for sample in self.unknowns:
1593				params.add(f'D{self._4x}_{pf(sample)}', value = 0.5)
1594
1595			for k in constraints:
1596				params[k].expr = constraints[k]
1597
1598			def residuals(p):
1599				R = []
1600				for r in self:
1601					session = pf(r['Session'])
1602					sample = pf(r['Sample'])
1603					if r['Sample'] in self.Nominal_D4x:
1604						R += [ (
1605							r[f'D{self._4x}raw'] - (
1606								p[f'a_{session}'] * self.Nominal_D4x[r['Sample']]
1607								+ p[f'b_{session}'] * r[f'd{self._4x}']
1608								+	p[f'c_{session}']
1609								+ r['t'] * (
1610									p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']]
1611									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1612									+	p[f'c2_{session}']
1613									)
1614								)
1615							) / r[f'wD{self._4x}raw'] ]
1616					else:
1617						R += [ (
1618							r[f'D{self._4x}raw'] - (
1619								p[f'a_{session}'] * p[f'D{self._4x}_{sample}']
1620								+ p[f'b_{session}'] * r[f'd{self._4x}']
1621								+	p[f'c_{session}']
1622								+ r['t'] * (
1623									p[f'a2_{session}'] * p[f'D{self._4x}_{sample}']
1624									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1625									+	p[f'c2_{session}']
1626									)
1627								)
1628							) / r[f'wD{self._4x}raw'] ]
1629				return R
1630
1631			M = Minimizer(residuals, params)
1632			result = M.least_squares()
1633			self.Nf = result.nfree
1634			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1635			new_names, new_covar, new_se = _fullcovar(result)[:3]
1636			result.var_names = new_names
1637			result.covar = new_covar
1638
1639			for r in self:
1640				s = pf(r["Session"])
1641				a = result.params.valuesdict()[f'a_{s}']
1642				b = result.params.valuesdict()[f'b_{s}']
1643				c = result.params.valuesdict()[f'c_{s}']
1644				a2 = result.params.valuesdict()[f'a2_{s}']
1645				b2 = result.params.valuesdict()[f'b2_{s}']
1646				c2 = result.params.valuesdict()[f'c2_{s}']
1647				r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1648				
1649
1650			self.standardization = result
1651
1652			for session in self.sessions:
1653				self.sessions[session]['Np'] = 3
1654				for k in ['scrambling', 'slope', 'wg']:
1655					if self.sessions[session][f'{k}_drift']:
1656						self.sessions[session]['Np'] += 1
1657
1658			if consolidate:
1659				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
1660			return result
1661
1662
1663		elif method == 'indep_sessions':
1664
1665			if weighted_sessions:
1666				for session_group in weighted_sessions:
1667					X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x)
1668					X.Nominal_D4x = self.Nominal_D4x.copy()
1669					X.refresh()
1670					# This is only done to assign r['wD47raw'] for r in X:
1671					X.standardize(method = method, weighted_sessions = [], consolidate = False)
1672					self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}')
1673			else:
1674				self.msg('All weights set to 1 ‰')
1675				for r in self:
1676					r[f'wD{self._4x}raw'] = 1
1677
1678			for session in self.sessions:
1679				s = self.sessions[session]
1680				p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2']
1681				p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']]
1682				s['Np'] = sum(p_active)
1683				sdata = s['data']
1684
1685				A = np.array([
1686					[
1687						self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'],
1688						r[f'd{self._4x}'] / r[f'wD{self._4x}raw'],
1689						1 / r[f'wD{self._4x}raw'],
1690						self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'],
1691						r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'],
1692						r['t'] / r[f'wD{self._4x}raw']
1693						]
1694					for r in sdata if r['Sample'] in self.anchors
1695					])[:,p_active] # only keep columns for the active parameters
1696				Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors])
1697				s['Na'] = Y.size
1698				CM = linalg.inv(A.T @ A)
1699				bf = (CM @ A.T @ Y).T[0,:]
1700				k = 0
1701				for n,a in zip(p_names, p_active):
1702					if a:
1703						s[n] = bf[k]
1704# 						self.msg(f'{n} = {bf[k]}')
1705						k += 1
1706					else:
1707						s[n] = 0.
1708# 						self.msg(f'{n} = 0.0')
1709
1710				for r in sdata :
1711					a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2']
1712					r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1713					r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t'])
1714
1715				s['CM'] = np.zeros((6,6))
1716				i = 0
1717				k_active = [j for j,a in enumerate(p_active) if a]
1718				for j,a in enumerate(p_active):
1719					if a:
1720						s['CM'][j,k_active] = CM[i,:]
1721						i += 1
1722
1723			if not weighted_sessions:
1724				w = self.rmswd()['rmswd']
1725				for r in self:
1726						r[f'wD{self._4x}'] *= w
1727						r[f'wD{self._4x}raw'] *= w
1728				for session in self.sessions:
1729					self.sessions[session]['CM'] *= w**2
1730
1731			for session in self.sessions:
1732				s = self.sessions[session]
1733				s['SE_a'] = s['CM'][0,0]**.5
1734				s['SE_b'] = s['CM'][1,1]**.5
1735				s['SE_c'] = s['CM'][2,2]**.5
1736				s['SE_a2'] = s['CM'][3,3]**.5
1737				s['SE_b2'] = s['CM'][4,4]**.5
1738				s['SE_c2'] = s['CM'][5,5]**.5
1739
1740			if not weighted_sessions:
1741				self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions])
1742			else:
1743				self.Nf = 0
1744				for sg in weighted_sessions:
1745					self.Nf += self.rmswd(sessions = sg)['Nf']
1746
1747			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1748
1749			avgD4x = {
1750				sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample])
1751				for sample in self.samples
1752				}
1753			chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self])
1754			rD4x = (chi2/self.Nf)**.5
1755			self.repeatability[f'sigma_{self._4x}'] = rD4x
1756
1757			if consolidate:
1758				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
1759
1760
1761	def standardization_error(self, session, d4x, D4x, t = 0):
1762		'''
1763		Compute standardization error for a given session and
1764		(δ47, Δ47) composition.
1765		'''
1766		a = self.sessions[session]['a']
1767		b = self.sessions[session]['b']
1768		c = self.sessions[session]['c']
1769		a2 = self.sessions[session]['a2']
1770		b2 = self.sessions[session]['b2']
1771		c2 = self.sessions[session]['c2']
1772		CM = self.sessions[session]['CM']
1773
1774		x, y = D4x, d4x
1775		z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t
1776# 		x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t)
1777		dxdy = -(b+b2*t) / (a+a2*t)
1778		dxdz = 1. / (a+a2*t)
1779		dxda = -x / (a+a2*t)
1780		dxdb = -y / (a+a2*t)
1781		dxdc = -1. / (a+a2*t)
1782		dxda2 = -x * a2 / (a+a2*t)
1783		dxdb2 = -y * t / (a+a2*t)
1784		dxdc2 = -t / (a+a2*t)
1785		V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2])
1786		sx = (V @ CM @ V.T) ** .5
1787		return sx
1788
1789
1790	@make_verbal
1791	def summary(self,
1792		dir = 'output',
1793		filename = None,
1794		save_to_file = True,
1795		print_out = True,
1796		):
1797		'''
1798		Print out an/or save to disk a summary of the standardization results.
1799
1800		**Parameters**
1801
1802		+ `dir`: the directory in which to save the table
1803		+ `filename`: the name to the csv file to write to
1804		+ `save_to_file`: whether to save the table to disk
1805		+ `print_out`: whether to print out the table
1806		'''
1807
1808		out = []
1809		out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]]
1810		out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]]
1811		out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]]
1812		out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]]
1813		out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]]
1814		out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]]
1815		out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]]
1816		out += [['Model degrees of freedom', f"{self.Nf}"]]
1817		out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]]
1818		out += [['Standardization method', self.standardization_method]]
1819
1820		if save_to_file:
1821			if not os.path.exists(dir):
1822				os.makedirs(dir)
1823			if filename is None:
1824				filename = f'D{self._4x}_summary.csv'
1825			with open(f'{dir}/{filename}', 'w') as fid:
1826				fid.write(make_csv(out))
1827		if print_out:
1828			self.msg('\n' + pretty_table(out, header = 0))
1829
1830
1831	@make_verbal
1832	def table_of_sessions(self,
1833		dir = 'output',
1834		filename = None,
1835		save_to_file = True,
1836		print_out = True,
1837		output = None,
1838		):
1839		'''
1840		Print out an/or save to disk a table of sessions.
1841
1842		**Parameters**
1843
1844		+ `dir`: the directory in which to save the table
1845		+ `filename`: the name to the csv file to write to
1846		+ `save_to_file`: whether to save the table to disk
1847		+ `print_out`: whether to print out the table
1848		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1849		    if set to `'raw'`: return a list of list of strings
1850		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1851		'''
1852		include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions])
1853		include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions])
1854		include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions])
1855
1856		out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']]
1857		if include_a2:
1858			out[-1] += ['a2 ± SE']
1859		if include_b2:
1860			out[-1] += ['b2 ± SE']
1861		if include_c2:
1862			out[-1] += ['c2 ± SE']
1863		for session in self.sessions:
1864			out += [[
1865				session,
1866				f"{self.sessions[session]['Na']}",
1867				f"{self.sessions[session]['Nu']}",
1868				f"{self.sessions[session]['d13Cwg_VPDB']:.3f}",
1869				f"{self.sessions[session]['d18Owg_VSMOW']:.3f}",
1870				f"{self.sessions[session]['r_d13C_VPDB']:.4f}",
1871				f"{self.sessions[session]['r_d18O_VSMOW']:.4f}",
1872				f"{self.sessions[session][f'r_D{self._4x}']:.4f}",
1873				f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}",
1874				f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}",
1875				f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}",
1876				]]
1877			if include_a2:
1878				if self.sessions[session]['scrambling_drift']:
1879					out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"]
1880				else:
1881					out[-1] += ['']
1882			if include_b2:
1883				if self.sessions[session]['slope_drift']:
1884					out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"]
1885				else:
1886					out[-1] += ['']
1887			if include_c2:
1888				if self.sessions[session]['wg_drift']:
1889					out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"]
1890				else:
1891					out[-1] += ['']
1892
1893		if save_to_file:
1894			if not os.path.exists(dir):
1895				os.makedirs(dir)
1896			if filename is None:
1897				filename = f'D{self._4x}_sessions.csv'
1898			with open(f'{dir}/{filename}', 'w') as fid:
1899				fid.write(make_csv(out))
1900		if print_out:
1901			self.msg('\n' + pretty_table(out))
1902		if output == 'raw':
1903			return out
1904		elif output == 'pretty':
1905			return pretty_table(out)
1906
1907
1908	@make_verbal
1909	def table_of_analyses(
1910		self,
1911		dir = 'output',
1912		filename = None,
1913		save_to_file = True,
1914		print_out = True,
1915		output = None,
1916		):
1917		'''
1918		Print out an/or save to disk a table of analyses.
1919
1920		**Parameters**
1921
1922		+ `dir`: the directory in which to save the table
1923		+ `filename`: the name to the csv file to write to
1924		+ `save_to_file`: whether to save the table to disk
1925		+ `print_out`: whether to print out the table
1926		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1927		    if set to `'raw'`: return a list of list of strings
1928		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1929		'''
1930
1931		out = [['UID','Session','Sample']]
1932		extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}]
1933		for f in extra_fields:
1934			out[-1] += [f[0]]
1935		out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}']
1936		for r in self:
1937			out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]]
1938			for f in extra_fields:
1939				out[-1] += [f"{r[f[0]]:{f[1]}}"]
1940			out[-1] += [
1941				f"{r['d13Cwg_VPDB']:.3f}",
1942				f"{r['d18Owg_VSMOW']:.3f}",
1943				f"{r['d45']:.6f}",
1944				f"{r['d46']:.6f}",
1945				f"{r['d47']:.6f}",
1946				f"{r['d48']:.6f}",
1947				f"{r['d49']:.6f}",
1948				f"{r['d13C_VPDB']:.6f}",
1949				f"{r['d18O_VSMOW']:.6f}",
1950				f"{r['D47raw']:.6f}",
1951				f"{r['D48raw']:.6f}",
1952				f"{r['D49raw']:.6f}",
1953				f"{r[f'D{self._4x}']:.6f}"
1954				]
1955		if save_to_file:
1956			if not os.path.exists(dir):
1957				os.makedirs(dir)
1958			if filename is None:
1959				filename = f'D{self._4x}_analyses.csv'
1960			with open(f'{dir}/{filename}', 'w') as fid:
1961				fid.write(make_csv(out))
1962		if print_out:
1963			self.msg('\n' + pretty_table(out))
1964		return out
1965
1966	@make_verbal
1967	def covar_table(
1968		self,
1969		correl = False,
1970		dir = 'output',
1971		filename = None,
1972		save_to_file = True,
1973		print_out = True,
1974		output = None,
1975		):
1976		'''
1977		Print out, save to disk and/or return the variance-covariance matrix of D4x
1978		for all unknown samples.
1979
1980		**Parameters**
1981
1982		+ `dir`: the directory in which to save the csv
1983		+ `filename`: the name of the csv file to write to
1984		+ `save_to_file`: whether to save the csv
1985		+ `print_out`: whether to print out the matrix
1986		+ `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`);
1987		    if set to `'raw'`: return a list of list of strings
1988		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1989		'''
1990		samples = sorted([u for u in self.unknowns])
1991		out = [[''] + samples]
1992		for s1 in samples:
1993			out.append([s1])
1994			for s2 in samples:
1995				if correl:
1996					out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}')
1997				else:
1998					out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}')
1999
2000		if save_to_file:
2001			if not os.path.exists(dir):
2002				os.makedirs(dir)
2003			if filename is None:
2004				if correl:
2005					filename = f'D{self._4x}_correl.csv'
2006				else:
2007					filename = f'D{self._4x}_covar.csv'
2008			with open(f'{dir}/{filename}', 'w') as fid:
2009				fid.write(make_csv(out))
2010		if print_out:
2011			self.msg('\n'+pretty_table(out))
2012		if output == 'raw':
2013			return out
2014		elif output == 'pretty':
2015			return pretty_table(out)
2016
2017	@make_verbal
2018	def table_of_samples(
2019		self,
2020		dir = 'output',
2021		filename = None,
2022		save_to_file = True,
2023		print_out = True,
2024		output = None,
2025		):
2026		'''
2027		Print out, save to disk and/or return a table of samples.
2028
2029		**Parameters**
2030
2031		+ `dir`: the directory in which to save the csv
2032		+ `filename`: the name of the csv file to write to
2033		+ `save_to_file`: whether to save the csv
2034		+ `print_out`: whether to print out the table
2035		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
2036		    if set to `'raw'`: return a list of list of strings
2037		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
2038		'''
2039
2040		out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']]
2041		for sample in self.anchors:
2042			out += [[
2043				f"{sample}",
2044				f"{self.samples[sample]['N']}",
2045				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2046				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2047				f"{self.samples[sample][f'D{self._4x}']:.4f}",'','',
2048				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', ''
2049				]]
2050		for sample in self.unknowns:
2051			out += [[
2052				f"{sample}",
2053				f"{self.samples[sample]['N']}",
2054				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2055				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2056				f"{self.samples[sample][f'D{self._4x}']:.4f}",
2057				f"{self.samples[sample][f'SE_D{self._4x}']:.4f}",
2058				f{self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}",
2059				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '',
2060				f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else ''
2061				]]
2062		if save_to_file:
2063			if not os.path.exists(dir):
2064				os.makedirs(dir)
2065			if filename is None:
2066				filename = f'D{self._4x}_samples.csv'
2067			with open(f'{dir}/{filename}', 'w') as fid:
2068				fid.write(make_csv(out))
2069		if print_out:
2070			self.msg('\n'+pretty_table(out))
2071		if output == 'raw':
2072			return out
2073		elif output == 'pretty':
2074			return pretty_table(out)
2075
2076
2077	def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100):
2078		'''
2079		Generate session plots and save them to disk.
2080
2081		**Parameters**
2082
2083		+ `dir`: the directory in which to save the plots
2084		+ `figsize`: the width and height (in inches) of each plot
2085		+ `filetype`: 'pdf' or 'png'
2086		+ `dpi`: resolution for PNG output
2087		'''
2088		if not os.path.exists(dir):
2089			os.makedirs(dir)
2090
2091		for session in self.sessions:
2092			sp = self.plot_single_session(session, xylimits = 'constant')
2093			ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {}))
2094			ppl.close(sp.fig)
2095
2096
2097	@make_verbal
2098	def consolidate_samples(self):
2099		'''
2100		Compile various statistics for each sample.
2101
2102		For each anchor sample:
2103
2104		+ `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x`
2105		+ `SE_D47` or `SE_D48`: set to zero by definition
2106
2107		For each unknown sample:
2108
2109		+ `D47` or `D48`: the standardized Δ4x value for this unknown
2110		+ `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown
2111
2112		For each anchor and unknown:
2113
2114		+ `N`: the total number of analyses of this sample
2115		+ `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample
2116		+ `d13C_VPDB`: the average δ13C_VPDB value for this sample
2117		+ `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2)
2118		+ `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal
2119		variance, indicating whether the Δ4x repeatability this sample differs significantly from
2120		that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`.
2121		'''
2122		D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']]
2123		for sample in self.samples:
2124			self.samples[sample]['N'] = len(self.samples[sample]['data'])
2125			if self.samples[sample]['N'] > 1:
2126				self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']])
2127
2128			self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']])
2129			self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']])
2130
2131			D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']]
2132			if len(D4x_pop) > 2:
2133				self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1]
2134			
2135		if self.standardization_method == 'pooled':
2136			for sample in self.anchors:
2137				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2138				self.samples[sample][f'SE_D{self._4x}'] = 0.
2139			for sample in self.unknowns:
2140				self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}']
2141				try:
2142					self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5
2143				except ValueError:
2144					# when `sample` is constrained by self.standardize(constraints = {...}),
2145					# it is no longer listed in self.standardization.var_names.
2146					# Temporary fix: define SE as zero for now
2147					self.samples[sample][f'SE_D4{self._4x}'] = 0.
2148
2149		elif self.standardization_method == 'indep_sessions':
2150			for sample in self.anchors:
2151				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2152				self.samples[sample][f'SE_D{self._4x}'] = 0.
2153			for sample in self.unknowns:
2154				self.msg(f'Consolidating sample {sample}')
2155				self.unknowns[sample][f'session_D{self._4x}'] = {}
2156				session_avg = []
2157				for session in self.sessions:
2158					sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample]
2159					if sdata:
2160						self.msg(f'{sample} found in session {session}')
2161						avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata])
2162						avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata])
2163						# !! TODO: sigma_s below does not account for temporal changes in standardization error
2164						sigma_s = self.standardization_error(session, avg_d4x, avg_D4x)
2165						sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5
2166						session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5])
2167						self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1]
2168				self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg))
2169				weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']}
2170				wsum = sum([weights[s] for s in weights])
2171				for s in weights:
2172					self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum]
2173
2174		for r in self:
2175			r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}']
2176
2177
2178
2179	def consolidate_sessions(self):
2180		'''
2181		Compute various statistics for each session.
2182
2183		+ `Na`: Number of anchor analyses in the session
2184		+ `Nu`: Number of unknown analyses in the session
2185		+ `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session
2186		+ `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session
2187		+ `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session
2188		+ `a`: scrambling factor
2189		+ `b`: compositional slope
2190		+ `c`: WG offset
2191		+ `SE_a`: Model stadard erorr of `a`
2192		+ `SE_b`: Model stadard erorr of `b`
2193		+ `SE_c`: Model stadard erorr of `c`
2194		+ `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`)
2195		+ `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`)
2196		+ `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`)
2197		+ `a2`: scrambling factor drift
2198		+ `b2`: compositional slope drift
2199		+ `c2`: WG offset drift
2200		+ `Np`: Number of standardization parameters to fit
2201		+ `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`)
2202		+ `d13Cwg_VPDB`: δ13C_VPDB of WG
2203		+ `d18Owg_VSMOW`: δ18O_VSMOW of WG
2204		'''
2205		for session in self.sessions:
2206			if 'd13Cwg_VPDB' not in self.sessions[session]:
2207				self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB']
2208			if 'd18Owg_VSMOW' not in self.sessions[session]:
2209				self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW']
2210			self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors])
2211			self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns])
2212
2213			self.msg(f'Computing repeatabilities for session {session}')
2214			self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session])
2215			self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session])
2216			self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session])
2217
2218		if self.standardization_method == 'pooled':
2219			for session in self.sessions:
2220
2221				self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}']
2222				i = self.standardization.var_names.index(f'a_{pf(session)}')
2223				self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5
2224
2225				self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}']
2226				i = self.standardization.var_names.index(f'b_{pf(session)}')
2227				self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5
2228
2229				self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}']
2230				i = self.standardization.var_names.index(f'c_{pf(session)}')
2231				self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5
2232
2233				self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}']
2234				if self.sessions[session]['scrambling_drift']:
2235					i = self.standardization.var_names.index(f'a2_{pf(session)}')
2236					self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5
2237				else:
2238					self.sessions[session]['SE_a2'] = 0.
2239
2240				self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}']
2241				if self.sessions[session]['slope_drift']:
2242					i = self.standardization.var_names.index(f'b2_{pf(session)}')
2243					self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5
2244				else:
2245					self.sessions[session]['SE_b2'] = 0.
2246
2247				self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}']
2248				if self.sessions[session]['wg_drift']:
2249					i = self.standardization.var_names.index(f'c2_{pf(session)}')
2250					self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5
2251				else:
2252					self.sessions[session]['SE_c2'] = 0.
2253
2254				i = self.standardization.var_names.index(f'a_{pf(session)}')
2255				j = self.standardization.var_names.index(f'b_{pf(session)}')
2256				k = self.standardization.var_names.index(f'c_{pf(session)}')
2257				CM = np.zeros((6,6))
2258				CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]]
2259				try:
2260					i2 = self.standardization.var_names.index(f'a2_{pf(session)}')
2261					CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]]
2262					CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2]
2263					try:
2264						j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2265						CM[3,4] = self.standardization.covar[i2,j2]
2266						CM[4,3] = self.standardization.covar[j2,i2]
2267					except ValueError:
2268						pass
2269					try:
2270						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2271						CM[3,5] = self.standardization.covar[i2,k2]
2272						CM[5,3] = self.standardization.covar[k2,i2]
2273					except ValueError:
2274						pass
2275				except ValueError:
2276					pass
2277				try:
2278					j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2279					CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]]
2280					CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2]
2281					try:
2282						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2283						CM[4,5] = self.standardization.covar[j2,k2]
2284						CM[5,4] = self.standardization.covar[k2,j2]
2285					except ValueError:
2286						pass
2287				except ValueError:
2288					pass
2289				try:
2290					k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2291					CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]]
2292					CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2]
2293				except ValueError:
2294					pass
2295
2296				self.sessions[session]['CM'] = CM
2297
2298		elif self.standardization_method == 'indep_sessions':
2299			pass # Not implemented yet
2300
2301
2302	@make_verbal
2303	def repeatabilities(self):
2304		'''
2305		Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x
2306		(for all samples, for anchors, and for unknowns).
2307		'''
2308		self.msg('Computing reproducibilities for all sessions')
2309
2310		self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors')
2311		self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors')
2312		self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors')
2313		self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns')
2314		self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples')
2315
2316
2317	@make_verbal
2318	def consolidate(self, tables = True, plots = True):
2319		'''
2320		Collect information about samples, sessions and repeatabilities.
2321		'''
2322		self.consolidate_samples()
2323		self.consolidate_sessions()
2324		self.repeatabilities()
2325
2326		if tables:
2327			self.summary()
2328			self.table_of_sessions()
2329			self.table_of_analyses()
2330			self.table_of_samples()
2331
2332		if plots:
2333			self.plot_sessions()
2334
2335
2336	@make_verbal
2337	def rmswd(self,
2338		samples = 'all samples',
2339		sessions = 'all sessions',
2340		):
2341		'''
2342		Compute the χ2, root mean squared weighted deviation
2343		(i.e. reduced χ2), and corresponding degrees of freedom of the
2344		Δ4x values for samples in `samples` and sessions in `sessions`.
2345		
2346		Only used in `D4xdata.standardize()` with `method='indep_sessions'`.
2347		'''
2348		if samples == 'all samples':
2349			mysamples = [k for k in self.samples]
2350		elif samples == 'anchors':
2351			mysamples = [k for k in self.anchors]
2352		elif samples == 'unknowns':
2353			mysamples = [k for k in self.unknowns]
2354		else:
2355			mysamples = samples
2356
2357		if sessions == 'all sessions':
2358			sessions = [k for k in self.sessions]
2359
2360		chisq, Nf = 0, 0
2361		for sample in mysamples :
2362			G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2363			if len(G) > 1 :
2364				X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G])
2365				Nf += (len(G) - 1)
2366				chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G])
2367		r = (chisq / Nf)**.5 if Nf > 0 else 0
2368		self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.')
2369		return {'rmswd': r, 'chisq': chisq, 'Nf': Nf}
2370
2371	
2372	@make_verbal
2373	def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'):
2374		'''
2375		Compute the repeatability of `[r[key] for r in self]`
2376		'''
2377
2378		if samples == 'all samples':
2379			mysamples = [k for k in self.samples]
2380		elif samples == 'anchors':
2381			mysamples = [k for k in self.anchors]
2382		elif samples == 'unknowns':
2383			mysamples = [k for k in self.unknowns]
2384		else:
2385			mysamples = samples
2386
2387		if sessions == 'all sessions':
2388			sessions = [k for k in self.sessions]
2389
2390		if key in ['D47', 'D48']:
2391			# Full disclosure: the definition of Nf is tricky/debatable
2392			G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions]
2393			chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum()
2394			Nf = len(G)
2395# 			print(f'len(G) = {Nf}')
2396			Nf -= len([s for s in mysamples if s in self.unknowns])
2397# 			print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider')
2398			for session in sessions:
2399				Np = len([
2400					_ for _ in self.standardization.params
2401					if (
2402						self.standardization.params[_].expr is not None
2403						and (
2404							(_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session))
2405							or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session))
2406							)
2407						)
2408					])
2409# 				print(f'session {session}: {Np} parameters to consider')
2410				Na = len({
2411					r['Sample'] for r in self.sessions[session]['data']
2412					if r['Sample'] in self.anchors and r['Sample'] in mysamples
2413					})
2414# 				print(f'session {session}: {Na} different anchors in that session')
2415				Nf -= min(Np, Na)
2416# 			print(f'Nf = {Nf}')
2417
2418# 			for sample in mysamples :
2419# 				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2420# 				if len(X) > 1 :
2421# 					chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ])
2422# 					if sample in self.unknowns:
2423# 						Nf += len(X) - 1
2424# 					else:
2425# 						Nf += len(X)
2426# 			if samples in ['anchors', 'all samples']:
2427# 				Nf -= sum([self.sessions[s]['Np'] for s in sessions])
2428			r = (chisq / Nf)**.5 if Nf > 0 else 0
2429
2430		else: # if key not in ['D47', 'D48']
2431			chisq, Nf = 0, 0
2432			for sample in mysamples :
2433				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2434				if len(X) > 1 :
2435					Nf += len(X) - 1
2436					chisq += np.sum([ (x-np.mean(X))**2 for x in X ])
2437			r = (chisq / Nf)**.5 if Nf > 0 else 0
2438
2439		self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.')
2440		return r
2441
2442	def sample_average(self, samples, weights = 'equal', normalize = True):
2443		'''
2444		Weighted average Δ4x value of a group of samples, accounting for covariance.
2445
2446		Returns the weighed average Δ4x value and associated SE
2447		of a group of samples. Weights are equal by default. If `normalize` is
2448		true, `weights` will be rescaled so that their sum equals 1.
2449
2450		**Examples**
2451
2452		```python
2453		self.sample_average(['X','Y'], [1, 2])
2454		```
2455
2456		returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3,
2457		where Δ4x(X) and Δ4x(Y) are the average Δ4x
2458		values of samples X and Y, respectively.
2459
2460		```python
2461		self.sample_average(['X','Y'], [1, -1], normalize = False)
2462		```
2463
2464		returns the value and SE of the difference Δ4x(X) - Δ4x(Y).
2465		'''
2466		if weights == 'equal':
2467			weights = [1/len(samples)] * len(samples)
2468
2469		if normalize:
2470			s = sum(weights)
2471			if s:
2472				weights = [w/s for w in weights]
2473
2474		try:
2475# 			indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples]
2476# 			C = self.standardization.covar[indices,:][:,indices]
2477			C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples])
2478			X = [self.samples[sample][f'D{self._4x}'] for sample in samples]
2479			return correlated_sum(X, C, weights)
2480		except ValueError:
2481			return (0., 0.)
2482
2483
2484	def sample_D4x_covar(self, sample1, sample2 = None):
2485		'''
2486		Covariance between Δ4x values of samples
2487
2488		Returns the error covariance between the average Δ4x values of two
2489		samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`),
2490		returns the Δ4x variance for that sample.
2491		'''
2492		if sample2 is None:
2493			sample2 = sample1
2494		if self.standardization_method == 'pooled':
2495			i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}')
2496			j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}')
2497			return self.standardization.covar[i, j]
2498		elif self.standardization_method == 'indep_sessions':
2499			if sample1 == sample2:
2500				return self.samples[sample1][f'SE_D{self._4x}']**2
2501			else:
2502				c = 0
2503				for session in self.sessions:
2504					sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1]
2505					sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2]
2506					if sdata1 and sdata2:
2507						a = self.sessions[session]['a']
2508						# !! TODO: CM below does not account for temporal changes in standardization parameters
2509						CM = self.sessions[session]['CM'][:3,:3]
2510						avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1])
2511						avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1])
2512						avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2])
2513						avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2])
2514						c += (
2515							self.unknowns[sample1][f'session_D{self._4x}'][session][2]
2516							* self.unknowns[sample2][f'session_D{self._4x}'][session][2]
2517							* np.array([[avg_D4x_1, avg_d4x_1, 1]])
2518							@ CM
2519							@ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T
2520							) / a**2
2521				return float(c)
2522
2523	def sample_D4x_correl(self, sample1, sample2 = None):
2524		'''
2525		Correlation between Δ4x errors of samples
2526
2527		Returns the error correlation between the average Δ4x values of two samples.
2528		'''
2529		if sample2 is None or sample2 == sample1:
2530			return 1.
2531		return (
2532			self.sample_D4x_covar(sample1, sample2)
2533			/ self.unknowns[sample1][f'SE_D{self._4x}']
2534			/ self.unknowns[sample2][f'SE_D{self._4x}']
2535			)
2536
2537	def plot_single_session(self,
2538		session,
2539		kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4),
2540		kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4),
2541		kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75),
2542		kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75),
2543		kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75),
2544		xylimits = 'free', # | 'constant'
2545		x_label = None,
2546		y_label = None,
2547		error_contour_interval = 'auto',
2548		fig = 'new',
2549		):
2550		'''
2551		Generate plot for a single session
2552		'''
2553		if x_label is None:
2554			x_label = f'δ$_{{{self._4x}}}$ (‰)'
2555		if y_label is None:
2556			y_label = f'Δ$_{{{self._4x}}}$ (‰)'
2557
2558		out = _SessionPlot()
2559		anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]]
2560		unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]]
2561		
2562		if fig == 'new':
2563			out.fig = ppl.figure(figsize = (6,6))
2564			ppl.subplots_adjust(.1,.1,.9,.9)
2565
2566		out.anchor_analyses, = ppl.plot(
2567			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2568			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2569			**kw_plot_anchors)
2570		out.unknown_analyses, = ppl.plot(
2571			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2572			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2573			**kw_plot_unknowns)
2574		out.anchor_avg = ppl.plot(
2575			np.array([ np.array([
2576				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2577				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2578				]) for sample in anchors]).T,
2579			np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T,
2580			**kw_plot_anchor_avg)
2581		out.unknown_avg = ppl.plot(
2582			np.array([ np.array([
2583				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2584				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2585				]) for sample in unknowns]).T,
2586			np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T,
2587			**kw_plot_unknown_avg)
2588		if xylimits == 'constant':
2589			x = [r[f'd{self._4x}'] for r in self]
2590			y = [r[f'D{self._4x}'] for r in self]
2591			x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y)
2592			w, h = x2-x1, y2-y1
2593			x1 -= w/20
2594			x2 += w/20
2595			y1 -= h/20
2596			y2 += h/20
2597			ppl.axis([x1, x2, y1, y2])
2598		elif xylimits == 'free':
2599			x1, x2, y1, y2 = ppl.axis()
2600		else:
2601			x1, x2, y1, y2 = ppl.axis(xylimits)
2602				
2603		if error_contour_interval != 'none':
2604			xi, yi = np.linspace(x1, x2), np.linspace(y1, y2)
2605			XI,YI = np.meshgrid(xi, yi)
2606			SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi])
2607			if error_contour_interval == 'auto':
2608				rng = np.max(SI) - np.min(SI)
2609				if rng <= 0.01:
2610					cinterval = 0.001
2611				elif rng <= 0.03:
2612					cinterval = 0.004
2613				elif rng <= 0.1:
2614					cinterval = 0.01
2615				elif rng <= 0.3:
2616					cinterval = 0.03
2617				elif rng <= 1.:
2618					cinterval = 0.1
2619				else:
2620					cinterval = 0.5
2621			else:
2622				cinterval = error_contour_interval
2623
2624			cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval)
2625			out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error)
2626			out.clabel = ppl.clabel(out.contour)
2627
2628		ppl.xlabel(x_label)
2629		ppl.ylabel(y_label)
2630		ppl.title(session, weight = 'bold')
2631		ppl.grid(alpha = .2)
2632		out.ax = ppl.gca()		
2633
2634		return out
2635
2636	def plot_residuals(
2637		self,
2638		kde = False,
2639		hist = False,
2640		binwidth = 2/3,
2641		dir = 'output',
2642		filename = None,
2643		highlight = [],
2644		colors = None,
2645		figsize = None,
2646		dpi = 100,
2647		yspan = None,
2648		):
2649		'''
2650		Plot residuals of each analysis as a function of time (actually, as a function of
2651		the order of analyses in the `D4xdata` object)
2652
2653		+ `kde`: whether to add a kernel density estimate of residuals
2654		+ `hist`: whether to add a histogram of residuals (incompatible with `kde`)
2655		+ `histbins`: specify bin edges for the histogram
2656		+ `dir`: the directory in which to save the plot
2657		+ `highlight`: a list of samples to highlight
2658		+ `colors`: a dict of `{<sample>: <color>}` for all samples
2659		+ `figsize`: (width, height) of figure
2660		+ `dpi`: resolution for PNG output
2661		+ `yspan`: factor controlling the range of y values shown in plot
2662		  (by default: `yspan = 1.5 if kde else 1.0`)
2663		'''
2664		
2665		from matplotlib import ticker
2666
2667		if yspan is None:
2668			if kde:
2669				yspan = 1.5
2670			else:
2671				yspan = 1.0
2672		
2673		# Layout
2674		fig = ppl.figure(figsize = (8,4) if figsize is None else figsize)
2675		if hist or kde:
2676			ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72)
2677			ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15)
2678		else:
2679			ppl.subplots_adjust(.08,.05,.78,.8)
2680			ax1 = ppl.subplot(111)
2681		
2682		# Colors
2683		N = len(self.anchors)
2684		if colors is None:
2685			if len(highlight) > 0:
2686				Nh = len(highlight)
2687				if Nh == 1:
2688					colors = {highlight[0]: (0,0,0)}
2689				elif Nh == 3:
2690					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])}
2691				elif Nh == 4:
2692					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2693				else:
2694					colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)}
2695			else:
2696				if N == 3:
2697					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])}
2698				elif N == 4:
2699					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2700				else:
2701					colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)}
2702
2703		ppl.sca(ax1)
2704		
2705		ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75)
2706
2707		ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$'))
2708
2709		session = self[0]['Session']
2710		x1 = 0
2711# 		ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self])
2712		x_sessions = {}
2713		one_or_more_singlets = False
2714		one_or_more_multiplets = False
2715		multiplets = set()
2716		for k,r in enumerate(self):
2717			if r['Session'] != session:
2718				x2 = k-1
2719				x_sessions[session] = (x1+x2)/2
2720				ppl.axvline(k - 0.5, color = 'k', lw = .5)
2721				session = r['Session']
2722				x1 = k
2723			singlet = len(self.samples[r['Sample']]['data']) == 1
2724			if not singlet:
2725				multiplets.add(r['Sample'])
2726			if r['Sample'] in self.unknowns:
2727				if singlet:
2728					one_or_more_singlets = True
2729				else:
2730					one_or_more_multiplets = True
2731			kw = dict(
2732				marker = 'x' if singlet else '+',
2733				ms = 4 if singlet else 5,
2734				ls = 'None',
2735				mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0),
2736				mew = 1,
2737				alpha = 0.2 if singlet else 1,
2738				)
2739			if highlight and r['Sample'] not in highlight:
2740				kw['alpha'] = 0.2
2741			ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw)
2742		x2 = k
2743		x_sessions[session] = (x1+x2)/2
2744
2745		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1)
2746		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1)
2747		if not (hist or kde):
2748			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center')
2749			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f"   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center')
2750
2751		xmin, xmax, ymin, ymax = ppl.axis()
2752		if yspan != 1:
2753			ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2
2754		for s in x_sessions:
2755			ppl.text(
2756				x_sessions[s],
2757				ymax +1,
2758				s,
2759				va = 'bottom',
2760				**(
2761					dict(ha = 'center')
2762					if len(self.sessions[s]['data']) > (0.15 * len(self))
2763					else dict(ha = 'left', rotation = 45)
2764					)
2765				)
2766
2767		if hist or kde:
2768			ppl.sca(ax2)
2769
2770		for s in colors:
2771			kw['marker'] = '+'
2772			kw['ms'] = 5
2773			kw['mec'] = colors[s]
2774			kw['label'] = s
2775			kw['alpha'] = 1
2776			ppl.plot([], [], **kw)
2777
2778		kw['mec'] = (0,0,0)
2779
2780		if one_or_more_singlets:
2781			kw['marker'] = 'x'
2782			kw['ms'] = 4
2783			kw['alpha'] = .2
2784			kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other'
2785			ppl.plot([], [], **kw)
2786
2787		if one_or_more_multiplets:
2788			kw['marker'] = '+'
2789			kw['ms'] = 4
2790			kw['alpha'] = 1
2791			kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other'
2792			ppl.plot([], [], **kw)
2793
2794		if hist or kde:
2795			leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9)
2796		else:
2797			leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5)
2798		leg.set_zorder(-1000)
2799
2800		ppl.sca(ax1)
2801
2802		ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)')
2803		ppl.xticks([])
2804		ppl.axis([-1, len(self), None, None])
2805
2806		if hist or kde:
2807			ppl.sca(ax2)
2808			X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors])
2809
2810			if kde:
2811				from scipy.stats import gaussian_kde
2812				yi = np.linspace(ymin, ymax, 201)
2813				xi = gaussian_kde(X).evaluate(yi)
2814				ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1))
2815# 				ppl.plot(xi, yi, 'k-', lw = 1)
2816			elif hist:
2817				ppl.hist(
2818					X,
2819					orientation = 'horizontal',
2820					histtype = 'stepfilled',
2821					ec = [.4]*3,
2822					fc = [.25]*3,
2823					alpha = .25,
2824					bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)),
2825					)
2826			ppl.text(0, 0,
2827				f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm",
2828				size = 7.5,
2829				alpha = 1,
2830				va = 'center',
2831				ha = 'left',
2832				)
2833
2834			ppl.axis([0, None, ymin, ymax])
2835			ppl.xticks([])
2836			ppl.yticks([])
2837# 			ax2.spines['left'].set_visible(False)
2838			ax2.spines['right'].set_visible(False)
2839			ax2.spines['top'].set_visible(False)
2840			ax2.spines['bottom'].set_visible(False)
2841
2842		ax1.axis([None, None, ymin, ymax])
2843
2844		if not os.path.exists(dir):
2845			os.makedirs(dir)
2846		if filename is None:
2847			return fig
2848		elif filename == '':
2849			filename = f'D{self._4x}_residuals.pdf'
2850		ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2851		ppl.close(fig)
2852				
2853
2854	def simulate(self, *args, **kwargs):
2855		'''
2856		Legacy function with warning message pointing to `virtual_data()`
2857		'''
2858		raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()')
2859
2860	def plot_distribution_of_analyses(
2861		self,
2862		dir = 'output',
2863		filename = None,
2864		vs_time = False,
2865		figsize = (6,4),
2866		subplots_adjust = (0.02, 0.13, 0.85, 0.8),
2867		output = None,
2868		dpi = 100,
2869		):
2870		'''
2871		Plot temporal distribution of all analyses in the data set.
2872		
2873		**Parameters**
2874
2875		+ `dir`: the directory in which to save the plot
2876		+ `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially.
2877		+ `dpi`: resolution for PNG output
2878		+ `figsize`: (width, height) of figure
2879		+ `dpi`: resolution for PNG output
2880		'''
2881
2882		asamples = [s for s in self.anchors]
2883		usamples = [s for s in self.unknowns]
2884		if output is None or output == 'fig':
2885			fig = ppl.figure(figsize = figsize)
2886			ppl.subplots_adjust(*subplots_adjust)
2887		Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2888		Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2889		Xmax += (Xmax-Xmin)/40
2890		Xmin -= (Xmax-Xmin)/41
2891		for k, s in enumerate(asamples + usamples):
2892			if vs_time:
2893				X = [r['TimeTag'] for r in self if r['Sample'] == s]
2894			else:
2895				X = [x for x,r in enumerate(self) if r['Sample'] == s]
2896			Y = [-k for x in X]
2897			ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75)
2898			ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25)
2899			ppl.text(Xmax, -k, f'   {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r')
2900		ppl.axis([Xmin, Xmax, -k-1, 1])
2901		ppl.xlabel('\ntime')
2902		ppl.gca().annotate('',
2903			xy = (0.6, -0.02),
2904			xycoords = 'axes fraction',
2905			xytext = (.4, -0.02), 
2906            arrowprops = dict(arrowstyle = "->", color = 'k'),
2907            )
2908			
2909
2910		x2 = -1
2911		for session in self.sessions:
2912			x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2913			if vs_time:
2914				ppl.axvline(x1, color = 'k', lw = .75)
2915			if x2 > -1:
2916				if not vs_time:
2917					ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5)
2918			x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2919# 			from xlrd import xldate_as_datetime
2920# 			print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0))
2921			if vs_time:
2922				ppl.axvline(x2, color = 'k', lw = .75)
2923				ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15)
2924			ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8)
2925
2926		ppl.xticks([])
2927		ppl.yticks([])
2928
2929		if output is None:
2930			if not os.path.exists(dir):
2931				os.makedirs(dir)
2932			if filename == None:
2933				filename = f'D{self._4x}_distribution_of_analyses.pdf'
2934			ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2935			ppl.close(fig)
2936		elif output == 'ax':
2937			return ppl.gca()
2938		elif output == 'fig':
2939			return fig
2940
2941
2942	def plot_bulk_compositions(
2943		self,
2944		samples = None,
2945		dir = 'output/bulk_compositions',
2946		figsize = (6,6),
2947		subplots_adjust = (0.15, 0.12, 0.95, 0.92),
2948		show = False,
2949		sample_color = (0,.5,1),
2950		analysis_color = (.7,.7,.7),
2951		labeldist = 0.3,
2952		radius = 0.05,
2953		):
2954		'''
2955		Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses.
2956		
2957		By default, creates a directory `./output/bulk_compositions` where plots for
2958		each sample are saved. Another plot named `__all__.pdf` shows all analyses together.
2959		
2960		
2961		**Parameters**
2962
2963		+ `samples`: Only these samples are processed (by default: all samples).
2964		+ `dir`: where to save the plots
2965		+ `figsize`: (width, height) of figure
2966		+ `subplots_adjust`: passed to `subplots_adjust()`
2967		+ `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples,
2968		allowing for interactive visualization/exploration in (δ13C, δ18O) space.
2969		+ `sample_color`: color used for replicate markers/labels
2970		+ `analysis_color`: color used for sample markers/labels
2971		+ `labeldist`: distance (in inches) from replicate markers to replicate labels
2972		+ `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`.
2973		'''
2974
2975		from matplotlib.patches import Ellipse
2976
2977		if samples is None:
2978			samples = [_ for _ in self.samples]
2979
2980		saved = {}
2981
2982		for s in samples:
2983
2984			fig = ppl.figure(figsize = figsize)
2985			fig.subplots_adjust(*subplots_adjust)
2986			ax = ppl.subplot(111)
2987			ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
2988			ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
2989			ppl.title(s)
2990
2991
2992			XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']])
2993			UID = [_['UID'] for _ in self.samples[s]['data']]
2994			XY0 = XY.mean(0)
2995
2996			for xy in XY:
2997				ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color)
2998				
2999			ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color)
3000			ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color)
3001			ppl.text(*XY0, f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3002			saved[s] = [XY, XY0]
3003			
3004			x1, x2, y1, y2 = ppl.axis()
3005			x0, dx = (x1+x2)/2, (x2-x1)/2
3006			y0, dy = (y1+y2)/2, (y2-y1)/2
3007			dx, dy = [max(max(dx, dy), radius)]*2
3008
3009			ppl.axis([
3010				x0 - 1.2*dx,
3011				x0 + 1.2*dx,
3012				y0 - 1.2*dy,
3013				y0 + 1.2*dy,
3014				])			
3015
3016			XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0))
3017
3018			for xy, uid in zip(XY, UID):
3019
3020				xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy))
3021				vector_in_display_space = xy_in_display_space - XY0_in_display_space
3022
3023				if (vector_in_display_space**2).sum() > 0:
3024
3025					unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5
3026					label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist
3027					label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space
3028					label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space))
3029
3030					ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color)
3031
3032				else:
3033
3034					ppl.text(*xy, f'{uid}  ', va = 'center', ha = 'right', color = analysis_color)
3035
3036			if radius:
3037				ax.add_artist(Ellipse(
3038					xy = XY0,
3039					width = radius*2,
3040					height = radius*2,
3041					ls = (0, (2,2)),
3042					lw = .7,
3043					ec = analysis_color,
3044					fc = 'None',
3045					))
3046				ppl.text(
3047					XY0[0],
3048					XY0[1]-radius,
3049					f'\n± {radius*1e3:.0f} ppm',
3050					color = analysis_color,
3051					va = 'top',
3052					ha = 'center',
3053					linespacing = 0.4,
3054					size = 8,
3055					)
3056
3057			if not os.path.exists(dir):
3058				os.makedirs(dir)
3059			fig.savefig(f'{dir}/{s}.pdf')
3060			ppl.close(fig)
3061
3062		fig = ppl.figure(figsize = figsize)
3063		fig.subplots_adjust(*subplots_adjust)
3064		ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
3065		ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
3066
3067		for s in saved:
3068			for xy in saved[s][0]:
3069				ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color)
3070			ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color)
3071			ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color)
3072			ppl.text(*saved[s][1], f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3073
3074		x1, x2, y1, y2 = ppl.axis()
3075		ppl.axis([
3076			x1 - (x2-x1)/10,
3077			x2 + (x2-x1)/10,
3078			y1 - (y2-y1)/10,
3079			y2 + (y2-y1)/10,
3080			])			
3081
3082
3083		if not os.path.exists(dir):
3084			os.makedirs(dir)
3085		fig.savefig(f'{dir}/__all__.pdf')
3086		if show:
3087			ppl.show()
3088		ppl.close(fig)
3089		
3090
3091	def _save_D4x_correl(
3092		self,
3093		samples = None,
3094		dir = 'output',
3095		filename = None,
3096		D4x_precision = 4,
3097		correl_precision = 4,
3098		):
3099		'''
3100		Save D4x values along with their SE and correlation matrix.
3101
3102		**Parameters**
3103
3104		+ `samples`: Only these samples are output (by default: all samples).
3105		+ `dir`: the directory in which to save the faile (by defaut: `output`)
3106		+ `filename`: the name to the csv file to write to (by default: `D4x_correl.csv`)
3107		+ `D4x_precision`: the precision to use when writing `D4x` and `D4x_SE` values (by default: 4)
3108		+ `correl_precision`: the precision to use when writing correlation factor values (by default: 4)
3109		'''
3110		if samples is None:
3111			samples = sorted([s for s in self.unknowns])
3112		
3113		out = [['Sample']] + [[s] for s in samples]
3114		out[0] += [f'D{self._4x}', f'D{self._4x}_SE', f'D{self._4x}_correl']
3115		for k,s in enumerate(samples):
3116			out[k+1] += [f'{self.samples[s][f"D{self._4x}"]:.4f}', f'{self.samples[s][f"SE_D{self._4x}"]:.4f}']
3117			for s2 in samples:
3118				out[k+1] += [f'{self.sample_D4x_correl(s,s2):.4f}']
3119		
3120		if not os.path.exists(dir):
3121			os.makedirs(dir)
3122		if filename is None:
3123			filename = f'D{self._4x}_correl.csv'
3124		with open(f'{dir}/{filename}', 'w') as fid:
3125			fid.write(make_csv(out))

Store and process data for a large set of Δ47 and/or Δ48 analyses, usually comprising more than one analytical session.

D4xdata(l=[], mass='47', logfile='', session='mySession', verbose=False)
923	def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False):
924		'''
925		**Parameters**
926
927		+ `l`: a list of dictionaries, with each dictionary including at least the keys
928		`Sample`, `d45`, `d46`, and `d47` or `d48`.
929		+ `mass`: `'47'` or `'48'`
930		+ `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods.
931		+ `session`: define session name for analyses without a `Session` key
932		+ `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods.
933
934		Returns a `D4xdata` object derived from `list`.
935		'''
936		self._4x = mass
937		self.verbose = verbose
938		self.prefix = 'D4xdata'
939		self.logfile = logfile
940		list.__init__(self, l)
941		self.Nf = None
942		self.repeatability = {}
943		self.refresh(session = session)

Parameters

  • l: a list of dictionaries, with each dictionary including at least the keys Sample, d45, d46, and d47 or d48.
  • mass: '47' or '48'
  • logfile: if specified, write detailed logs to this file path when calling D4xdata methods.
  • session: define session name for analyses without a Session key
  • verbose: if True, print out detailed logs when calling D4xdata methods.

Returns a D4xdata object derived from list.

R13_VPDB = 0.01118

Absolute (13C/12C) ratio of VPDB. By default equal to 0.01118 (Chang & Li, 1990)

R18_VSMOW = 0.0020052

Absolute (18O/16C) ratio of VSMOW. By default equal to 0.0020052 (Baertschi, 1976)

LAMBDA_17 = 0.528

Mass-dependent exponent for triple oxygen isotopes. By default equal to 0.528 (Barkan & Luz, 2005)

R17_VSMOW = 0.00038475

Absolute (17O/16C) ratio of VSMOW. By default equal to 0.00038475 (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB)

R18_VPDB = 0.0020672007840000003

Absolute (18O/16C) ratio of VPDB. By definition equal to R18_VSMOW * 1.03092.

R17_VPDB = 0.0003909861828790272

Absolute (17O/16C) ratio of VPDB. By definition equal to R17_VSMOW * 1.03092 ** LAMBDA_17.

LEVENE_REF_SAMPLE = 'ETH-3'

After the Δ4x standardization step, each sample is tested to assess whether the Δ4x variance within all analyses for that sample differs significantly from that observed for a given reference sample (using Levene's test, which yields a p-value corresponding to the null hypothesis that the underlying variances are equal).

LEVENE_REF_SAMPLE (by default equal to 'ETH-3') specifies which sample should be used as a reference for this test.

ALPHA_18O_ACID_REACTION = 1.008129

Specifies the 18O/16O fractionation factor generally applicable to acid reactions in the dataset. Currently used by D4xdata.wg(), D4xdata.standardize_d13C, and D4xdata.standardize_d18O.

By default equal to 1.008129 (calcite reacted at 90 °C, Kim et al., 2007).

Nominal_d13C_VPDB = {'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}

Nominal δ13CVPDB values assigned to carbonate standards, used by D4xdata.standardize_d13C().

By default equal to {'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71} after Bernasconi et al. (2018).

Nominal_d18O_VPDB = {'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}

Nominal δ18OVPDB values assigned to carbonate standards, used by D4xdata.standardize_d18O().

By default equal to {'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78} after Bernasconi et al. (2018).

d13C_STANDARDIZATION_METHOD = '2pt'

Method by which to standardize δ13C values:

  • none: do not apply any δ13C standardization.
  • '1pt': within each session, offset all initial δ13C values so as to minimize the difference between final δ13CVPDB values and Nominal_d13C_VPDB (averaged over all analyses for which Nominal_d13C_VPDB is defined).
  • '2pt': within each session, apply a affine trasformation to all δ13C values so as to minimize the difference between final δ13CVPDB values and Nominal_d13C_VPDB (averaged over all analyses for which Nominal_d13C_VPDB is defined).
d18O_STANDARDIZATION_METHOD = '2pt'

Method by which to standardize δ18O values:

  • none: do not apply any δ18O standardization.
  • '1pt': within each session, offset all initial δ18O values so as to minimize the difference between final δ18OVPDB values and Nominal_d18O_VPDB (averaged over all analyses for which Nominal_d18O_VPDB is defined).
  • '2pt': within each session, apply a affine trasformation to all δ18O values so as to minimize the difference between final δ18OVPDB values and Nominal_d18O_VPDB (averaged over all analyses for which Nominal_d18O_VPDB is defined).
def make_verbal(oldfun):
946	def make_verbal(oldfun):
947		'''
948		Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`.
949		'''
950		@wraps(oldfun)
951		def newfun(*args, verbose = '', **kwargs):
952			myself = args[0]
953			oldprefix = myself.prefix
954			myself.prefix = oldfun.__name__
955			if verbose != '':
956				oldverbose = myself.verbose
957				myself.verbose = verbose
958			out = oldfun(*args, **kwargs)
959			myself.prefix = oldprefix
960			if verbose != '':
961				myself.verbose = oldverbose
962			return out
963		return newfun

Decorator: allow temporarily changing self.prefix and overriding self.verbose.

def msg(self, txt):
966	def msg(self, txt):
967		'''
968		Log a message to `self.logfile`, and print it out if `verbose = True`
969		'''
970		self.log(txt)
971		if self.verbose:
972			print(f'{f"[{self.prefix}]":<16} {txt}')

Log a message to self.logfile, and print it out if verbose = True

def vmsg(self, txt):
975	def vmsg(self, txt):
976		'''
977		Log a message to `self.logfile` and print it out
978		'''
979		self.log(txt)
980		print(txt)

Log a message to self.logfile and print it out

def log(self, *txts):
983	def log(self, *txts):
984		'''
985		Log a message to `self.logfile`
986		'''
987		if self.logfile:
988			with open(self.logfile, 'a') as fid:
989				for txt in txts:
990					fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}')

Log a message to self.logfile

def refresh(self, session='mySession'):
993	def refresh(self, session = 'mySession'):
994		'''
995		Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`.
996		'''
997		self.fill_in_missing_info(session = session)
998		self.refresh_sessions()
999		self.refresh_samples()

Update self.sessions, self.samples, self.anchors, and self.unknowns.

def refresh_sessions(self):
1002	def refresh_sessions(self):
1003		'''
1004		Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift`
1005		to `False` for all sessions.
1006		'''
1007		self.sessions = {
1008			s: {'data': [r for r in self if r['Session'] == s]}
1009			for s in sorted({r['Session'] for r in self})
1010			}
1011		for s in self.sessions:
1012			self.sessions[s]['scrambling_drift'] = False
1013			self.sessions[s]['slope_drift'] = False
1014			self.sessions[s]['wg_drift'] = False
1015			self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD
1016			self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD

Update self.sessions and set scrambling_drift, slope_drift, and wg_drift to False for all sessions.

def refresh_samples(self):
1019	def refresh_samples(self):
1020		'''
1021		Define `self.samples`, `self.anchors`, and `self.unknowns`.
1022		'''
1023		self.samples = {
1024			s: {'data': [r for r in self if r['Sample'] == s]}
1025			for s in sorted({r['Sample'] for r in self})
1026			}
1027		self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x}
1028		self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x}

Define self.samples, self.anchors, and self.unknowns.

def read(self, filename, sep='', session=''):
1031	def read(self, filename, sep = '', session = ''):
1032		'''
1033		Read file in csv format to load data into a `D47data` object.
1034
1035		In the csv file, spaces before and after field separators (`','` by default)
1036		are optional. Each line corresponds to a single analysis.
1037
1038		The required fields are:
1039
1040		+ `UID`: a unique identifier
1041		+ `Session`: an identifier for the analytical session
1042		+ `Sample`: a sample identifier
1043		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1044
1045		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1046		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1047		and `d49` are optional, and set to NaN by default.
1048
1049		**Parameters**
1050
1051		+ `fileneme`: the path of the file to read
1052		+ `sep`: csv separator delimiting the fields
1053		+ `session`: set `Session` field to this string for all analyses
1054		'''
1055		with open(filename) as fid:
1056			self.input(fid.read(), sep = sep, session = session)

Read file in csv format to load data into a D47data object.

In the csv file, spaces before and after field separators (',' by default) are optional. Each line corresponds to a single analysis.

The required fields are:

  • UID: a unique identifier
  • Session: an identifier for the analytical session
  • Sample: a sample identifier
  • d45, d46, and at least one of d47 or d48: the working-gas delta values

Independently known oxygen-17 anomalies may be provided as D17O (in ‰ relative to VSMOW, λ = self.LAMBDA_17), and are otherwise assumed to be zero. Working-gas deltas d47, d48 and d49 are optional, and set to NaN by default.

Parameters

  • fileneme: the path of the file to read
  • sep: csv separator delimiting the fields
  • session: set Session field to this string for all analyses
def input(self, txt, sep='', session=''):
1059	def input(self, txt, sep = '', session = ''):
1060		'''
1061		Read `txt` string in csv format to load analysis data into a `D47data` object.
1062
1063		In the csv string, spaces before and after field separators (`','` by default)
1064		are optional. Each line corresponds to a single analysis.
1065
1066		The required fields are:
1067
1068		+ `UID`: a unique identifier
1069		+ `Session`: an identifier for the analytical session
1070		+ `Sample`: a sample identifier
1071		+ `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values
1072
1073		Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to
1074		VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48`
1075		and `d49` are optional, and set to NaN by default.
1076
1077		**Parameters**
1078
1079		+ `txt`: the csv string to read
1080		+ `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`,
1081		whichever appers most often in `txt`.
1082		+ `session`: set `Session` field to this string for all analyses
1083		'''
1084		if sep == '':
1085			sep = sorted(',;\t', key = lambda x: - txt.count(x))[0]
1086		txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()]
1087		data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]]
1088
1089		if session != '':
1090			for r in data:
1091				r['Session'] = session
1092
1093		self += data
1094		self.refresh()

Read txt string in csv format to load analysis data into a D47data object.

In the csv string, spaces before and after field separators (',' by default) are optional. Each line corresponds to a single analysis.

The required fields are:

  • UID: a unique identifier
  • Session: an identifier for the analytical session
  • Sample: a sample identifier
  • d45, d46, and at least one of d47 or d48: the working-gas delta values

Independently known oxygen-17 anomalies may be provided as D17O (in ‰ relative to VSMOW, λ = self.LAMBDA_17), and are otherwise assumed to be zero. Working-gas deltas d47, d48 and d49 are optional, and set to NaN by default.

Parameters

  • txt: the csv string to read
  • sep: csv separator delimiting the fields. By default, use ,, ;, or , whichever appers most often in txt.
  • session: set Session field to this string for all analyses
@make_verbal
def wg(self, samples=None, a18_acid=None):
1097	@make_verbal
1098	def wg(self, samples = None, a18_acid = None):
1099		'''
1100		Compute bulk composition of the working gas for each session based on
1101		the carbonate standards defined in both `self.Nominal_d13C_VPDB` and
1102		`self.Nominal_d18O_VPDB`.
1103		'''
1104
1105		self.msg('Computing WG composition:')
1106
1107		if a18_acid is None:
1108			a18_acid = self.ALPHA_18O_ACID_REACTION
1109		if samples is None:
1110			samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB]
1111
1112		assert a18_acid, f'Acid fractionation factor should not be zero.'
1113
1114		samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB]
1115		R45R46_standards = {}
1116		for sample in samples:
1117			d13C_vpdb = self.Nominal_d13C_VPDB[sample]
1118			d18O_vpdb = self.Nominal_d18O_VPDB[sample]
1119			R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000)
1120			R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17
1121			R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid
1122
1123			C12_s = 1 / (1 + R13_s)
1124			C13_s = R13_s / (1 + R13_s)
1125			C16_s = 1 / (1 + R17_s + R18_s)
1126			C17_s = R17_s / (1 + R17_s + R18_s)
1127			C18_s = R18_s / (1 + R17_s + R18_s)
1128
1129			C626_s = C12_s * C16_s ** 2
1130			C627_s = 2 * C12_s * C16_s * C17_s
1131			C628_s = 2 * C12_s * C16_s * C18_s
1132			C636_s = C13_s * C16_s ** 2
1133			C637_s = 2 * C13_s * C16_s * C17_s
1134			C727_s = C12_s * C17_s ** 2
1135
1136			R45_s = (C627_s + C636_s) / C626_s
1137			R46_s = (C628_s + C637_s + C727_s) / C626_s
1138			R45R46_standards[sample] = (R45_s, R46_s)
1139		
1140		for s in self.sessions:
1141			db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples]
1142			assert db, f'No sample from {samples} found in session "{s}".'
1143# 			dbsamples = sorted({r['Sample'] for r in db})
1144
1145			X = [r['d45'] for r in db]
1146			Y = [R45R46_standards[r['Sample']][0] for r in db]
1147			x1, x2 = np.min(X), np.max(X)
1148
1149			if x1 < x2:
1150				wgcoord = x1/(x1-x2)
1151			else:
1152				wgcoord = 999
1153
1154			if wgcoord < -.5 or wgcoord > 1.5:
1155				# unreasonable to extrapolate to d45 = 0
1156				R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1157			else :
1158				# d45 = 0 is reasonably well bracketed
1159				R45_wg = np.polyfit(X, Y, 1)[1]
1160
1161			X = [r['d46'] for r in db]
1162			Y = [R45R46_standards[r['Sample']][1] for r in db]
1163			x1, x2 = np.min(X), np.max(X)
1164
1165			if x1 < x2:
1166				wgcoord = x1/(x1-x2)
1167			else:
1168				wgcoord = 999
1169
1170			if wgcoord < -.5 or wgcoord > 1.5:
1171				# unreasonable to extrapolate to d46 = 0
1172				R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)])
1173			else :
1174				# d46 = 0 is reasonably well bracketed
1175				R46_wg = np.polyfit(X, Y, 1)[1]
1176
1177			d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg)
1178
1179			self.msg(f'Session {s} WG:   δ13C_VPDB = {d13Cwg_VPDB:.3f}   δ18O_VSMOW = {d18Owg_VSMOW:.3f}')
1180
1181			self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB
1182			self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW
1183			for r in self.sessions[s]['data']:
1184				r['d13Cwg_VPDB'] = d13Cwg_VPDB
1185				r['d18Owg_VSMOW'] = d18Owg_VSMOW

Compute bulk composition of the working gas for each session based on the carbonate standards defined in both self.Nominal_d13C_VPDB and self.Nominal_d18O_VPDB.

def compute_bulk_delta(self, R45, R46, D17O=0):
1188	def compute_bulk_delta(self, R45, R46, D17O = 0):
1189		'''
1190		Compute δ13C_VPDB and δ18O_VSMOW,
1191		by solving the generalized form of equation (17) from
1192		[Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05),
1193		assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and
1194		solving the corresponding second-order Taylor polynomial.
1195		(Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014))
1196		'''
1197
1198		K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17
1199
1200		A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17)
1201		B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17
1202		C = 2 * self.R18_VSMOW
1203		D = -R46
1204
1205		aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2
1206		bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C
1207		cc = A + B + C + D
1208
1209		d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa)
1210
1211		R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW
1212		R17 = K * R18 ** self.LAMBDA_17
1213		R13 = R45 - 2 * R17
1214
1215		d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1)
1216
1217		return d13C_VPDB, d18O_VSMOW

Compute δ13CVPDB and δ18OVSMOW, by solving the generalized form of equation (17) from Brand et al. (2010), assuming that δ18OVSMOW is not too big (0 ± 50 ‰) and solving the corresponding second-order Taylor polynomial. (Appendix A of Daëron et al., 2016)

@make_verbal
def crunch(self, verbose=''):
1220	@make_verbal
1221	def crunch(self, verbose = ''):
1222		'''
1223		Compute bulk composition and raw clumped isotope anomalies for all analyses.
1224		'''
1225		for r in self:
1226			self.compute_bulk_and_clumping_deltas(r)
1227		self.standardize_d13C()
1228		self.standardize_d18O()
1229		self.msg(f"Crunched {len(self)} analyses.")

Compute bulk composition and raw clumped isotope anomalies for all analyses.

def fill_in_missing_info(self, session='mySession'):
1232	def fill_in_missing_info(self, session = 'mySession'):
1233		'''
1234		Fill in optional fields with default values
1235		'''
1236		for i,r in enumerate(self):
1237			if 'D17O' not in r:
1238				r['D17O'] = 0.
1239			if 'UID' not in r:
1240				r['UID'] = f'{i+1}'
1241			if 'Session' not in r:
1242				r['Session'] = session
1243			for k in ['d47', 'd48', 'd49']:
1244				if k not in r:
1245					r[k] = np.nan

Fill in optional fields with default values

def standardize_d13C(self):
1248	def standardize_d13C(self):
1249		'''
1250		Perform δ13C standadization within each session `s` according to
1251		`self.sessions[s]['d13C_standardization_method']`, which is defined by default
1252		by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but
1253		may be redefined abitrarily at a later stage.
1254		'''
1255		for s in self.sessions:
1256			if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']:
1257				XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB]
1258				X,Y = zip(*XY)
1259				if self.sessions[s]['d13C_standardization_method'] == '1pt':
1260					offset = np.mean(Y) - np.mean(X)
1261					for r in self.sessions[s]['data']:
1262						r['d13C_VPDB'] += offset				
1263				elif self.sessions[s]['d13C_standardization_method'] == '2pt':
1264					a,b = np.polyfit(X,Y,1)
1265					for r in self.sessions[s]['data']:
1266						r['d13C_VPDB'] = a * r['d13C_VPDB'] + b

Perform δ13C standadization within each session s according to self.sessions[s]['d13C_standardization_method'], which is defined by default by D47data.refresh_sessions()as equal to self.d13C_STANDARDIZATION_METHOD, but may be redefined abitrarily at a later stage.

def standardize_d18O(self):
1268	def standardize_d18O(self):
1269		'''
1270		Perform δ18O standadization within each session `s` according to
1271		`self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`,
1272		which is defined by default by `D47data.refresh_sessions()`as equal to
1273		`self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage.
1274		'''
1275		for s in self.sessions:
1276			if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']:
1277				XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB]
1278				X,Y = zip(*XY)
1279				Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y]
1280				if self.sessions[s]['d18O_standardization_method'] == '1pt':
1281					offset = np.mean(Y) - np.mean(X)
1282					for r in self.sessions[s]['data']:
1283						r['d18O_VSMOW'] += offset				
1284				elif self.sessions[s]['d18O_standardization_method'] == '2pt':
1285					a,b = np.polyfit(X,Y,1)
1286					for r in self.sessions[s]['data']:
1287						r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b

Perform δ18O standadization within each session s according to self.ALPHA_18O_ACID_REACTION and self.sessions[s]['d18O_standardization_method'], which is defined by default by D47data.refresh_sessions()as equal to self.d18O_STANDARDIZATION_METHOD, but may be redefined abitrarily at a later stage.

def compute_bulk_and_clumping_deltas(self, r):
1290	def compute_bulk_and_clumping_deltas(self, r):
1291		'''
1292		Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`.
1293		'''
1294
1295		# Compute working gas R13, R18, and isobar ratios
1296		R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000)
1297		R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000)
1298		R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg)
1299
1300		# Compute analyte isobar ratios
1301		R45 = (1 + r['d45'] / 1000) * R45_wg
1302		R46 = (1 + r['d46'] / 1000) * R46_wg
1303		R47 = (1 + r['d47'] / 1000) * R47_wg
1304		R48 = (1 + r['d48'] / 1000) * R48_wg
1305		R49 = (1 + r['d49'] / 1000) * R49_wg
1306
1307		r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O'])
1308		R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB
1309		R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW
1310
1311		# Compute stochastic isobar ratios of the analyte
1312		R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios(
1313			R13, R18, D17O = r['D17O']
1314		)
1315
1316		# Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1,
1317		# and raise a warning if the corresponding anomalies exceed 0.02 ppm.
1318		if (R45 / R45stoch - 1) > 5e-8:
1319			self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm')
1320		if (R46 / R46stoch - 1) > 5e-8:
1321			self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm')
1322
1323		# Compute raw clumped isotope anomalies
1324		r['D47raw'] = 1000 * (R47 / R47stoch - 1)
1325		r['D48raw'] = 1000 * (R48 / R48stoch - 1)
1326		r['D49raw'] = 1000 * (R49 / R49stoch - 1)

Compute δ13CVPDB, δ18OVSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis r.

def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0):
1329	def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0):
1330		'''
1331		Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`,
1332		optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope
1333		anomalies (`D47`, `D48`, `D49`), all expressed in permil.
1334		'''
1335
1336		# Compute R17
1337		R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17
1338
1339		# Compute isotope concentrations
1340		C12 = (1 + R13) ** -1
1341		C13 = C12 * R13
1342		C16 = (1 + R17 + R18) ** -1
1343		C17 = C16 * R17
1344		C18 = C16 * R18
1345
1346		# Compute stochastic isotopologue concentrations
1347		C626 = C16 * C12 * C16
1348		C627 = C16 * C12 * C17 * 2
1349		C628 = C16 * C12 * C18 * 2
1350		C636 = C16 * C13 * C16
1351		C637 = C16 * C13 * C17 * 2
1352		C638 = C16 * C13 * C18 * 2
1353		C727 = C17 * C12 * C17
1354		C728 = C17 * C12 * C18 * 2
1355		C737 = C17 * C13 * C17
1356		C738 = C17 * C13 * C18 * 2
1357		C828 = C18 * C12 * C18
1358		C838 = C18 * C13 * C18
1359
1360		# Compute stochastic isobar ratios
1361		R45 = (C636 + C627) / C626
1362		R46 = (C628 + C637 + C727) / C626
1363		R47 = (C638 + C728 + C737) / C626
1364		R48 = (C738 + C828) / C626
1365		R49 = C838 / C626
1366
1367		# Account for stochastic anomalies
1368		R47 *= 1 + D47 / 1000
1369		R48 *= 1 + D48 / 1000
1370		R49 *= 1 + D49 / 1000
1371
1372		# Return isobar ratios
1373		return R45, R46, R47, R48, R49

Compute isobar ratios for a sample with isotopic ratios R13 and R18, optionally accounting for non-zero values of Δ17O (D17O) and clumped isotope anomalies (D47, D48, D49), all expressed in permil.

def split_samples(self, samples_to_split='all', grouping='by_session'):
1376	def split_samples(self, samples_to_split = 'all', grouping = 'by_session'):
1377		'''
1378		Split unknown samples by UID (treat all analyses as different samples)
1379		or by session (treat analyses of a given sample in different sessions as
1380		different samples).
1381
1382		**Parameters**
1383
1384		+ `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']`
1385		+ `grouping`: `by_uid` | `by_session`
1386		'''
1387		if samples_to_split == 'all':
1388			samples_to_split = [s for s in self.unknowns]
1389		gkeys = {'by_uid':'UID', 'by_session':'Session'}
1390		self.grouping = grouping.lower()
1391		if self.grouping in gkeys:
1392			gkey = gkeys[self.grouping]
1393		for r in self:
1394			if r['Sample'] in samples_to_split:
1395				r['Sample_original'] = r['Sample']
1396				r['Sample'] = f"{r['Sample']}__{r[gkey]}"
1397			elif r['Sample'] in self.unknowns:
1398				r['Sample_original'] = r['Sample']
1399		self.refresh_samples()

Split unknown samples by UID (treat all analyses as different samples) or by session (treat analyses of a given sample in different sessions as different samples).

Parameters

  • samples_to_split: a list of samples to split, e.g., ['IAEA-C1', 'IAEA-C2']
  • grouping: by_uid | by_session
def unsplit_samples(self, tables=False):
1402	def unsplit_samples(self, tables = False):
1403		'''
1404		Reverse the effects of `D47data.split_samples()`.
1405		
1406		This should only be used after `D4xdata.standardize()` with `method='pooled'`.
1407		
1408		After `D4xdata.standardize()` with `method='indep_sessions'`, one should
1409		probably use `D4xdata.combine_samples()` instead to reverse the effects of
1410		`D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the
1411		effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in
1412		that case session-averaged Δ4x values are statistically independent).
1413		'''
1414		unknowns_old = sorted({s for s in self.unknowns})
1415		CM_old = self.standardization.covar[:,:]
1416		VD_old = self.standardization.params.valuesdict().copy()
1417		vars_old = self.standardization.var_names
1418
1419		unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r})
1420
1421		Ns = len(vars_old) - len(unknowns_old)
1422		vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new]
1423		VD_new = {k: VD_old[k] for k in vars_old[:Ns]}
1424
1425		W = np.zeros((len(vars_new), len(vars_old)))
1426		W[:Ns,:Ns] = np.eye(Ns)
1427		for u in unknowns_new:
1428			splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u})
1429			if self.grouping == 'by_session':
1430				weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits]
1431			elif self.grouping == 'by_uid':
1432				weights = [1 for s in splits]
1433			sw = sum(weights)
1434			weights = [w/sw for w in weights]
1435			W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:]
1436
1437		CM_new = W @ CM_old @ W.T
1438		V = W @ np.array([[VD_old[k]] for k in vars_old])
1439		VD_new = {k:v[0] for k,v in zip(vars_new, V)}
1440
1441		self.standardization.covar = CM_new
1442		self.standardization.params.valuesdict = lambda : VD_new
1443		self.standardization.var_names = vars_new
1444
1445		for r in self:
1446			if r['Sample'] in self.unknowns:
1447				r['Sample_split'] = r['Sample']
1448				r['Sample'] = r['Sample_original']
1449
1450		self.refresh_samples()
1451		self.consolidate_samples()
1452		self.repeatabilities()
1453
1454		if tables:
1455			self.table_of_analyses()
1456			self.table_of_samples()

Reverse the effects of D47data.split_samples().

This should only be used after D4xdata.standardize() with method='pooled'.

After D4xdata.standardize() with method='indep_sessions', one should probably use D4xdata.combine_samples() instead to reverse the effects of D47data.split_samples() with grouping='by_uid', or w_avg() to reverse the effects of D47data.split_samples() with grouping='by_sessions' (because in that case session-averaged Δ4x values are statistically independent).

def assign_timestamps(self):
1458	def assign_timestamps(self):
1459		'''
1460		Assign a time field `t` of type `float` to each analysis.
1461
1462		If `TimeTag` is one of the data fields, `t` is equal within a given session
1463		to `TimeTag` minus the mean value of `TimeTag` for that session.
1464		Otherwise, `TimeTag` is by default equal to the index of each analysis
1465		in the dataset and `t` is defined as above.
1466		'''
1467		for session in self.sessions:
1468			sdata = self.sessions[session]['data']
1469			try:
1470				t0 = np.mean([r['TimeTag'] for r in sdata])
1471				for r in sdata:
1472					r['t'] = r['TimeTag'] - t0
1473			except KeyError:
1474				t0 = (len(sdata)-1)/2
1475				for t,r in enumerate(sdata):
1476					r['t'] = t - t0

Assign a time field t of type float to each analysis.

If TimeTag is one of the data fields, t is equal within a given session to TimeTag minus the mean value of TimeTag for that session. Otherwise, TimeTag is by default equal to the index of each analysis in the dataset and t is defined as above.

def report(self):
1479	def report(self):
1480		'''
1481		Prints a report on the standardization fit.
1482		Only applicable after `D4xdata.standardize(method='pooled')`.
1483		'''
1484		report_fit(self.standardization)

Prints a report on the standardization fit. Only applicable after D4xdata.standardize(method='pooled').

def combine_samples(self, sample_groups):
1487	def combine_samples(self, sample_groups):
1488		'''
1489		Combine analyses of different samples to compute weighted average Δ4x
1490		and new error (co)variances corresponding to the groups defined by the `sample_groups`
1491		dictionary.
1492		
1493		Caution: samples are weighted by number of replicate analyses, which is a
1494		reasonable default behavior but is not always optimal (e.g., in the case of strongly
1495		correlated analytical errors for one or more samples).
1496		
1497		Returns a tuplet of:
1498		
1499		+ the list of group names
1500		+ an array of the corresponding Δ4x values
1501		+ the corresponding (co)variance matrix
1502		
1503		**Parameters**
1504
1505		+ `sample_groups`: a dictionary of the form:
1506		```py
1507		{'group1': ['sample_1', 'sample_2'],
1508		 'group2': ['sample_3', 'sample_4', 'sample_5']}
1509		```
1510		'''
1511		
1512		samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])]
1513		groups = sorted(sample_groups.keys())
1514		group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups}
1515		D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples])
1516		CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples])
1517		W = np.array([
1518			[self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples]
1519			for j in groups])
1520		D4x_new = W @ D4x_old
1521		CM_new = W @ CM_old @ W.T
1522
1523		return groups, D4x_new[:,0], CM_new

Combine analyses of different samples to compute weighted average Δ4x and new error (co)variances corresponding to the groups defined by the sample_groups dictionary.

Caution: samples are weighted by number of replicate analyses, which is a reasonable default behavior but is not always optimal (e.g., in the case of strongly correlated analytical errors for one or more samples).

Returns a tuplet of:

  • the list of group names
  • an array of the corresponding Δ4x values
  • the corresponding (co)variance matrix

Parameters

  • sample_groups: a dictionary of the form:
{'group1': ['sample_1', 'sample_2'],
 'group2': ['sample_3', 'sample_4', 'sample_5']}
@make_verbal
def standardize( self, method='pooled', weighted_sessions=[], consolidate=True, consolidate_tables=False, consolidate_plots=False, constraints={}):
1526	@make_verbal
1527	def standardize(self,
1528		method = 'pooled',
1529		weighted_sessions = [],
1530		consolidate = True,
1531		consolidate_tables = False,
1532		consolidate_plots = False,
1533		constraints = {},
1534		):
1535		'''
1536		Compute absolute Δ4x values for all replicate analyses and for sample averages.
1537		If `method` argument is set to `'pooled'`, the standardization processes all sessions
1538		in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous,
1539		i.e. that their true Δ4x value does not change between sessions,
1540		([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to
1541		`'indep_sessions'`, the standardization processes each session independently, based only
1542		on anchors analyses.
1543		'''
1544
1545		self.standardization_method = method
1546		self.assign_timestamps()
1547
1548		if method == 'pooled':
1549			if weighted_sessions:
1550				for session_group in weighted_sessions:
1551					if self._4x == '47':
1552						X = D47data([r for r in self if r['Session'] in session_group])
1553					elif self._4x == '48':
1554						X = D48data([r for r in self if r['Session'] in session_group])
1555					X.Nominal_D4x = self.Nominal_D4x.copy()
1556					X.refresh()
1557					result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False)
1558					w = np.sqrt(result.redchi)
1559					self.msg(f'Session group {session_group} MRSWD = {w:.4f}')
1560					for r in X:
1561						r[f'wD{self._4x}raw'] *= w
1562			else:
1563				self.msg(f'All D{self._4x}raw weights set to 1 ‰')
1564				for r in self:
1565					r[f'wD{self._4x}raw'] = 1.
1566
1567			params = Parameters()
1568			for k,session in enumerate(self.sessions):
1569				self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.")
1570				self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.")
1571				self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.")
1572				s = pf(session)
1573				params.add(f'a_{s}', value = 0.9)
1574				params.add(f'b_{s}', value = 0.)
1575				params.add(f'c_{s}', value = -0.9)
1576				params.add(f'a2_{s}', value = 0.,
1577# 					vary = self.sessions[session]['scrambling_drift'],
1578					)
1579				params.add(f'b2_{s}', value = 0.,
1580# 					vary = self.sessions[session]['slope_drift'],
1581					)
1582				params.add(f'c2_{s}', value = 0.,
1583# 					vary = self.sessions[session]['wg_drift'],
1584					)
1585				if not self.sessions[session]['scrambling_drift']:
1586					params[f'a2_{s}'].expr = '0'
1587				if not self.sessions[session]['slope_drift']:
1588					params[f'b2_{s}'].expr = '0'
1589				if not self.sessions[session]['wg_drift']:
1590					params[f'c2_{s}'].expr = '0'
1591
1592			for sample in self.unknowns:
1593				params.add(f'D{self._4x}_{pf(sample)}', value = 0.5)
1594
1595			for k in constraints:
1596				params[k].expr = constraints[k]
1597
1598			def residuals(p):
1599				R = []
1600				for r in self:
1601					session = pf(r['Session'])
1602					sample = pf(r['Sample'])
1603					if r['Sample'] in self.Nominal_D4x:
1604						R += [ (
1605							r[f'D{self._4x}raw'] - (
1606								p[f'a_{session}'] * self.Nominal_D4x[r['Sample']]
1607								+ p[f'b_{session}'] * r[f'd{self._4x}']
1608								+	p[f'c_{session}']
1609								+ r['t'] * (
1610									p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']]
1611									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1612									+	p[f'c2_{session}']
1613									)
1614								)
1615							) / r[f'wD{self._4x}raw'] ]
1616					else:
1617						R += [ (
1618							r[f'D{self._4x}raw'] - (
1619								p[f'a_{session}'] * p[f'D{self._4x}_{sample}']
1620								+ p[f'b_{session}'] * r[f'd{self._4x}']
1621								+	p[f'c_{session}']
1622								+ r['t'] * (
1623									p[f'a2_{session}'] * p[f'D{self._4x}_{sample}']
1624									+ p[f'b2_{session}'] * r[f'd{self._4x}']
1625									+	p[f'c2_{session}']
1626									)
1627								)
1628							) / r[f'wD{self._4x}raw'] ]
1629				return R
1630
1631			M = Minimizer(residuals, params)
1632			result = M.least_squares()
1633			self.Nf = result.nfree
1634			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1635			new_names, new_covar, new_se = _fullcovar(result)[:3]
1636			result.var_names = new_names
1637			result.covar = new_covar
1638
1639			for r in self:
1640				s = pf(r["Session"])
1641				a = result.params.valuesdict()[f'a_{s}']
1642				b = result.params.valuesdict()[f'b_{s}']
1643				c = result.params.valuesdict()[f'c_{s}']
1644				a2 = result.params.valuesdict()[f'a2_{s}']
1645				b2 = result.params.valuesdict()[f'b2_{s}']
1646				c2 = result.params.valuesdict()[f'c2_{s}']
1647				r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1648				
1649
1650			self.standardization = result
1651
1652			for session in self.sessions:
1653				self.sessions[session]['Np'] = 3
1654				for k in ['scrambling', 'slope', 'wg']:
1655					if self.sessions[session][f'{k}_drift']:
1656						self.sessions[session]['Np'] += 1
1657
1658			if consolidate:
1659				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
1660			return result
1661
1662
1663		elif method == 'indep_sessions':
1664
1665			if weighted_sessions:
1666				for session_group in weighted_sessions:
1667					X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x)
1668					X.Nominal_D4x = self.Nominal_D4x.copy()
1669					X.refresh()
1670					# This is only done to assign r['wD47raw'] for r in X:
1671					X.standardize(method = method, weighted_sessions = [], consolidate = False)
1672					self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}')
1673			else:
1674				self.msg('All weights set to 1 ‰')
1675				for r in self:
1676					r[f'wD{self._4x}raw'] = 1
1677
1678			for session in self.sessions:
1679				s = self.sessions[session]
1680				p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2']
1681				p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']]
1682				s['Np'] = sum(p_active)
1683				sdata = s['data']
1684
1685				A = np.array([
1686					[
1687						self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'],
1688						r[f'd{self._4x}'] / r[f'wD{self._4x}raw'],
1689						1 / r[f'wD{self._4x}raw'],
1690						self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'],
1691						r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'],
1692						r['t'] / r[f'wD{self._4x}raw']
1693						]
1694					for r in sdata if r['Sample'] in self.anchors
1695					])[:,p_active] # only keep columns for the active parameters
1696				Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors])
1697				s['Na'] = Y.size
1698				CM = linalg.inv(A.T @ A)
1699				bf = (CM @ A.T @ Y).T[0,:]
1700				k = 0
1701				for n,a in zip(p_names, p_active):
1702					if a:
1703						s[n] = bf[k]
1704# 						self.msg(f'{n} = {bf[k]}')
1705						k += 1
1706					else:
1707						s[n] = 0.
1708# 						self.msg(f'{n} = 0.0')
1709
1710				for r in sdata :
1711					a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2']
1712					r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t'])
1713					r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t'])
1714
1715				s['CM'] = np.zeros((6,6))
1716				i = 0
1717				k_active = [j for j,a in enumerate(p_active) if a]
1718				for j,a in enumerate(p_active):
1719					if a:
1720						s['CM'][j,k_active] = CM[i,:]
1721						i += 1
1722
1723			if not weighted_sessions:
1724				w = self.rmswd()['rmswd']
1725				for r in self:
1726						r[f'wD{self._4x}'] *= w
1727						r[f'wD{self._4x}raw'] *= w
1728				for session in self.sessions:
1729					self.sessions[session]['CM'] *= w**2
1730
1731			for session in self.sessions:
1732				s = self.sessions[session]
1733				s['SE_a'] = s['CM'][0,0]**.5
1734				s['SE_b'] = s['CM'][1,1]**.5
1735				s['SE_c'] = s['CM'][2,2]**.5
1736				s['SE_a2'] = s['CM'][3,3]**.5
1737				s['SE_b2'] = s['CM'][4,4]**.5
1738				s['SE_c2'] = s['CM'][5,5]**.5
1739
1740			if not weighted_sessions:
1741				self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions])
1742			else:
1743				self.Nf = 0
1744				for sg in weighted_sessions:
1745					self.Nf += self.rmswd(sessions = sg)['Nf']
1746
1747			self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf)
1748
1749			avgD4x = {
1750				sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample])
1751				for sample in self.samples
1752				}
1753			chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self])
1754			rD4x = (chi2/self.Nf)**.5
1755			self.repeatability[f'sigma_{self._4x}'] = rD4x
1756
1757			if consolidate:
1758				self.consolidate(tables = consolidate_tables, plots = consolidate_plots)

Compute absolute Δ4x values for all replicate analyses and for sample averages. If method argument is set to 'pooled', the standardization processes all sessions in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous, i.e. that their true Δ4x value does not change between sessions, (Daëron, 2021). If method argument is set to 'indep_sessions', the standardization processes each session independently, based only on anchors analyses.

def standardization_error(self, session, d4x, D4x, t=0):
1761	def standardization_error(self, session, d4x, D4x, t = 0):
1762		'''
1763		Compute standardization error for a given session and
1764		(δ47, Δ47) composition.
1765		'''
1766		a = self.sessions[session]['a']
1767		b = self.sessions[session]['b']
1768		c = self.sessions[session]['c']
1769		a2 = self.sessions[session]['a2']
1770		b2 = self.sessions[session]['b2']
1771		c2 = self.sessions[session]['c2']
1772		CM = self.sessions[session]['CM']
1773
1774		x, y = D4x, d4x
1775		z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t
1776# 		x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t)
1777		dxdy = -(b+b2*t) / (a+a2*t)
1778		dxdz = 1. / (a+a2*t)
1779		dxda = -x / (a+a2*t)
1780		dxdb = -y / (a+a2*t)
1781		dxdc = -1. / (a+a2*t)
1782		dxda2 = -x * a2 / (a+a2*t)
1783		dxdb2 = -y * t / (a+a2*t)
1784		dxdc2 = -t / (a+a2*t)
1785		V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2])
1786		sx = (V @ CM @ V.T) ** .5
1787		return sx

Compute standardization error for a given session and (δ47, Δ47) composition.

@make_verbal
def summary(self, dir='output', filename=None, save_to_file=True, print_out=True):
1790	@make_verbal
1791	def summary(self,
1792		dir = 'output',
1793		filename = None,
1794		save_to_file = True,
1795		print_out = True,
1796		):
1797		'''
1798		Print out an/or save to disk a summary of the standardization results.
1799
1800		**Parameters**
1801
1802		+ `dir`: the directory in which to save the table
1803		+ `filename`: the name to the csv file to write to
1804		+ `save_to_file`: whether to save the table to disk
1805		+ `print_out`: whether to print out the table
1806		'''
1807
1808		out = []
1809		out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]]
1810		out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]]
1811		out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]]
1812		out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]]
1813		out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]]
1814		out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]]
1815		out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]]
1816		out += [['Model degrees of freedom', f"{self.Nf}"]]
1817		out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]]
1818		out += [['Standardization method', self.standardization_method]]
1819
1820		if save_to_file:
1821			if not os.path.exists(dir):
1822				os.makedirs(dir)
1823			if filename is None:
1824				filename = f'D{self._4x}_summary.csv'
1825			with open(f'{dir}/{filename}', 'w') as fid:
1826				fid.write(make_csv(out))
1827		if print_out:
1828			self.msg('\n' + pretty_table(out, header = 0))

Print out an/or save to disk a summary of the standardization results.

Parameters

  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
@make_verbal
def table_of_sessions( self, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
1831	@make_verbal
1832	def table_of_sessions(self,
1833		dir = 'output',
1834		filename = None,
1835		save_to_file = True,
1836		print_out = True,
1837		output = None,
1838		):
1839		'''
1840		Print out an/or save to disk a table of sessions.
1841
1842		**Parameters**
1843
1844		+ `dir`: the directory in which to save the table
1845		+ `filename`: the name to the csv file to write to
1846		+ `save_to_file`: whether to save the table to disk
1847		+ `print_out`: whether to print out the table
1848		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1849		    if set to `'raw'`: return a list of list of strings
1850		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1851		'''
1852		include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions])
1853		include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions])
1854		include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions])
1855
1856		out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']]
1857		if include_a2:
1858			out[-1] += ['a2 ± SE']
1859		if include_b2:
1860			out[-1] += ['b2 ± SE']
1861		if include_c2:
1862			out[-1] += ['c2 ± SE']
1863		for session in self.sessions:
1864			out += [[
1865				session,
1866				f"{self.sessions[session]['Na']}",
1867				f"{self.sessions[session]['Nu']}",
1868				f"{self.sessions[session]['d13Cwg_VPDB']:.3f}",
1869				f"{self.sessions[session]['d18Owg_VSMOW']:.3f}",
1870				f"{self.sessions[session]['r_d13C_VPDB']:.4f}",
1871				f"{self.sessions[session]['r_d18O_VSMOW']:.4f}",
1872				f"{self.sessions[session][f'r_D{self._4x}']:.4f}",
1873				f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}",
1874				f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}",
1875				f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}",
1876				]]
1877			if include_a2:
1878				if self.sessions[session]['scrambling_drift']:
1879					out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"]
1880				else:
1881					out[-1] += ['']
1882			if include_b2:
1883				if self.sessions[session]['slope_drift']:
1884					out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"]
1885				else:
1886					out[-1] += ['']
1887			if include_c2:
1888				if self.sessions[session]['wg_drift']:
1889					out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"]
1890				else:
1891					out[-1] += ['']
1892
1893		if save_to_file:
1894			if not os.path.exists(dir):
1895				os.makedirs(dir)
1896			if filename is None:
1897				filename = f'D{self._4x}_sessions.csv'
1898			with open(f'{dir}/{filename}', 'w') as fid:
1899				fid.write(make_csv(out))
1900		if print_out:
1901			self.msg('\n' + pretty_table(out))
1902		if output == 'raw':
1903			return out
1904		elif output == 'pretty':
1905			return pretty_table(out)

Print out an/or save to disk a table of sessions.

Parameters

  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
@make_verbal
def table_of_analyses( self, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
1908	@make_verbal
1909	def table_of_analyses(
1910		self,
1911		dir = 'output',
1912		filename = None,
1913		save_to_file = True,
1914		print_out = True,
1915		output = None,
1916		):
1917		'''
1918		Print out an/or save to disk a table of analyses.
1919
1920		**Parameters**
1921
1922		+ `dir`: the directory in which to save the table
1923		+ `filename`: the name to the csv file to write to
1924		+ `save_to_file`: whether to save the table to disk
1925		+ `print_out`: whether to print out the table
1926		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
1927		    if set to `'raw'`: return a list of list of strings
1928		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1929		'''
1930
1931		out = [['UID','Session','Sample']]
1932		extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}]
1933		for f in extra_fields:
1934			out[-1] += [f[0]]
1935		out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}']
1936		for r in self:
1937			out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]]
1938			for f in extra_fields:
1939				out[-1] += [f"{r[f[0]]:{f[1]}}"]
1940			out[-1] += [
1941				f"{r['d13Cwg_VPDB']:.3f}",
1942				f"{r['d18Owg_VSMOW']:.3f}",
1943				f"{r['d45']:.6f}",
1944				f"{r['d46']:.6f}",
1945				f"{r['d47']:.6f}",
1946				f"{r['d48']:.6f}",
1947				f"{r['d49']:.6f}",
1948				f"{r['d13C_VPDB']:.6f}",
1949				f"{r['d18O_VSMOW']:.6f}",
1950				f"{r['D47raw']:.6f}",
1951				f"{r['D48raw']:.6f}",
1952				f"{r['D49raw']:.6f}",
1953				f"{r[f'D{self._4x}']:.6f}"
1954				]
1955		if save_to_file:
1956			if not os.path.exists(dir):
1957				os.makedirs(dir)
1958			if filename is None:
1959				filename = f'D{self._4x}_analyses.csv'
1960			with open(f'{dir}/{filename}', 'w') as fid:
1961				fid.write(make_csv(out))
1962		if print_out:
1963			self.msg('\n' + pretty_table(out))
1964		return out

Print out an/or save to disk a table of analyses.

Parameters

  • dir: the directory in which to save the table
  • filename: the name to the csv file to write to
  • save_to_file: whether to save the table to disk
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
@make_verbal
def covar_table( self, correl=False, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
1966	@make_verbal
1967	def covar_table(
1968		self,
1969		correl = False,
1970		dir = 'output',
1971		filename = None,
1972		save_to_file = True,
1973		print_out = True,
1974		output = None,
1975		):
1976		'''
1977		Print out, save to disk and/or return the variance-covariance matrix of D4x
1978		for all unknown samples.
1979
1980		**Parameters**
1981
1982		+ `dir`: the directory in which to save the csv
1983		+ `filename`: the name of the csv file to write to
1984		+ `save_to_file`: whether to save the csv
1985		+ `print_out`: whether to print out the matrix
1986		+ `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`);
1987		    if set to `'raw'`: return a list of list of strings
1988		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
1989		'''
1990		samples = sorted([u for u in self.unknowns])
1991		out = [[''] + samples]
1992		for s1 in samples:
1993			out.append([s1])
1994			for s2 in samples:
1995				if correl:
1996					out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}')
1997				else:
1998					out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}')
1999
2000		if save_to_file:
2001			if not os.path.exists(dir):
2002				os.makedirs(dir)
2003			if filename is None:
2004				if correl:
2005					filename = f'D{self._4x}_correl.csv'
2006				else:
2007					filename = f'D{self._4x}_covar.csv'
2008			with open(f'{dir}/{filename}', 'w') as fid:
2009				fid.write(make_csv(out))
2010		if print_out:
2011			self.msg('\n'+pretty_table(out))
2012		if output == 'raw':
2013			return out
2014		elif output == 'pretty':
2015			return pretty_table(out)

Print out, save to disk and/or return the variance-covariance matrix of D4x for all unknown samples.

Parameters

  • dir: the directory in which to save the csv
  • filename: the name of the csv file to write to
  • save_to_file: whether to save the csv
  • print_out: whether to print out the matrix
  • output: if set to 'pretty': return a pretty text matrix (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
@make_verbal
def table_of_samples( self, dir='output', filename=None, save_to_file=True, print_out=True, output=None):
2017	@make_verbal
2018	def table_of_samples(
2019		self,
2020		dir = 'output',
2021		filename = None,
2022		save_to_file = True,
2023		print_out = True,
2024		output = None,
2025		):
2026		'''
2027		Print out, save to disk and/or return a table of samples.
2028
2029		**Parameters**
2030
2031		+ `dir`: the directory in which to save the csv
2032		+ `filename`: the name of the csv file to write to
2033		+ `save_to_file`: whether to save the csv
2034		+ `print_out`: whether to print out the table
2035		+ `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`);
2036		    if set to `'raw'`: return a list of list of strings
2037		    (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`)
2038		'''
2039
2040		out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']]
2041		for sample in self.anchors:
2042			out += [[
2043				f"{sample}",
2044				f"{self.samples[sample]['N']}",
2045				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2046				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2047				f"{self.samples[sample][f'D{self._4x}']:.4f}",'','',
2048				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', ''
2049				]]
2050		for sample in self.unknowns:
2051			out += [[
2052				f"{sample}",
2053				f"{self.samples[sample]['N']}",
2054				f"{self.samples[sample]['d13C_VPDB']:.2f}",
2055				f"{self.samples[sample]['d18O_VSMOW']:.2f}",
2056				f"{self.samples[sample][f'D{self._4x}']:.4f}",
2057				f"{self.samples[sample][f'SE_D{self._4x}']:.4f}",
2058				f{self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}",
2059				f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '',
2060				f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else ''
2061				]]
2062		if save_to_file:
2063			if not os.path.exists(dir):
2064				os.makedirs(dir)
2065			if filename is None:
2066				filename = f'D{self._4x}_samples.csv'
2067			with open(f'{dir}/{filename}', 'w') as fid:
2068				fid.write(make_csv(out))
2069		if print_out:
2070			self.msg('\n'+pretty_table(out))
2071		if output == 'raw':
2072			return out
2073		elif output == 'pretty':
2074			return pretty_table(out)

Print out, save to disk and/or return a table of samples.

Parameters

  • dir: the directory in which to save the csv
  • filename: the name of the csv file to write to
  • save_to_file: whether to save the csv
  • print_out: whether to print out the table
  • output: if set to 'pretty': return a pretty text table (see pretty_table()); if set to 'raw': return a list of list of strings (e.g., [['header1', 'header2'], ['0.1', '0.2']])
def plot_sessions(self, dir='output', figsize=(8, 8), filetype='pdf', dpi=100):
2077	def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100):
2078		'''
2079		Generate session plots and save them to disk.
2080
2081		**Parameters**
2082
2083		+ `dir`: the directory in which to save the plots
2084		+ `figsize`: the width and height (in inches) of each plot
2085		+ `filetype`: 'pdf' or 'png'
2086		+ `dpi`: resolution for PNG output
2087		'''
2088		if not os.path.exists(dir):
2089			os.makedirs(dir)
2090
2091		for session in self.sessions:
2092			sp = self.plot_single_session(session, xylimits = 'constant')
2093			ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {}))
2094			ppl.close(sp.fig)

Generate session plots and save them to disk.

Parameters

  • dir: the directory in which to save the plots
  • figsize: the width and height (in inches) of each plot
  • filetype: 'pdf' or 'png'
  • dpi: resolution for PNG output
@make_verbal
def consolidate_samples(self):
2097	@make_verbal
2098	def consolidate_samples(self):
2099		'''
2100		Compile various statistics for each sample.
2101
2102		For each anchor sample:
2103
2104		+ `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x`
2105		+ `SE_D47` or `SE_D48`: set to zero by definition
2106
2107		For each unknown sample:
2108
2109		+ `D47` or `D48`: the standardized Δ4x value for this unknown
2110		+ `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown
2111
2112		For each anchor and unknown:
2113
2114		+ `N`: the total number of analyses of this sample
2115		+ `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample
2116		+ `d13C_VPDB`: the average δ13C_VPDB value for this sample
2117		+ `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2)
2118		+ `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal
2119		variance, indicating whether the Δ4x repeatability this sample differs significantly from
2120		that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`.
2121		'''
2122		D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']]
2123		for sample in self.samples:
2124			self.samples[sample]['N'] = len(self.samples[sample]['data'])
2125			if self.samples[sample]['N'] > 1:
2126				self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']])
2127
2128			self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']])
2129			self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']])
2130
2131			D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']]
2132			if len(D4x_pop) > 2:
2133				self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1]
2134			
2135		if self.standardization_method == 'pooled':
2136			for sample in self.anchors:
2137				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2138				self.samples[sample][f'SE_D{self._4x}'] = 0.
2139			for sample in self.unknowns:
2140				self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}']
2141				try:
2142					self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5
2143				except ValueError:
2144					# when `sample` is constrained by self.standardize(constraints = {...}),
2145					# it is no longer listed in self.standardization.var_names.
2146					# Temporary fix: define SE as zero for now
2147					self.samples[sample][f'SE_D4{self._4x}'] = 0.
2148
2149		elif self.standardization_method == 'indep_sessions':
2150			for sample in self.anchors:
2151				self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample]
2152				self.samples[sample][f'SE_D{self._4x}'] = 0.
2153			for sample in self.unknowns:
2154				self.msg(f'Consolidating sample {sample}')
2155				self.unknowns[sample][f'session_D{self._4x}'] = {}
2156				session_avg = []
2157				for session in self.sessions:
2158					sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample]
2159					if sdata:
2160						self.msg(f'{sample} found in session {session}')
2161						avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata])
2162						avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata])
2163						# !! TODO: sigma_s below does not account for temporal changes in standardization error
2164						sigma_s = self.standardization_error(session, avg_d4x, avg_D4x)
2165						sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5
2166						session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5])
2167						self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1]
2168				self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg))
2169				weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']}
2170				wsum = sum([weights[s] for s in weights])
2171				for s in weights:
2172					self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum]
2173
2174		for r in self:
2175			r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}']

Compile various statistics for each sample.

For each anchor sample:

  • D47 or D48: the nominal Δ4x value for this anchor, specified by self.Nominal_D4x
  • SE_D47 or SE_D48: set to zero by definition

For each unknown sample:

  • D47 or D48: the standardized Δ4x value for this unknown
  • SE_D47 or SE_D48: the standard error of Δ4x for this unknown

For each anchor and unknown:

  • N: the total number of analyses of this sample
  • SD_D47 or SD_D48: the “sample” (in the statistical sense) standard deviation for this sample
  • d13C_VPDB: the average δ13CVPDB value for this sample
  • d18O_VSMOW: the average δ18OVSMOW value for this sample (as CO2)
  • p_Levene: the p-value from a Levene test of equal variance, indicating whether the Δ4x repeatability this sample differs significantly from that observed for the reference sample specified by self.LEVENE_REF_SAMPLE.
def consolidate_sessions(self):
2179	def consolidate_sessions(self):
2180		'''
2181		Compute various statistics for each session.
2182
2183		+ `Na`: Number of anchor analyses in the session
2184		+ `Nu`: Number of unknown analyses in the session
2185		+ `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session
2186		+ `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session
2187		+ `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session
2188		+ `a`: scrambling factor
2189		+ `b`: compositional slope
2190		+ `c`: WG offset
2191		+ `SE_a`: Model stadard erorr of `a`
2192		+ `SE_b`: Model stadard erorr of `b`
2193		+ `SE_c`: Model stadard erorr of `c`
2194		+ `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`)
2195		+ `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`)
2196		+ `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`)
2197		+ `a2`: scrambling factor drift
2198		+ `b2`: compositional slope drift
2199		+ `c2`: WG offset drift
2200		+ `Np`: Number of standardization parameters to fit
2201		+ `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`)
2202		+ `d13Cwg_VPDB`: δ13C_VPDB of WG
2203		+ `d18Owg_VSMOW`: δ18O_VSMOW of WG
2204		'''
2205		for session in self.sessions:
2206			if 'd13Cwg_VPDB' not in self.sessions[session]:
2207				self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB']
2208			if 'd18Owg_VSMOW' not in self.sessions[session]:
2209				self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW']
2210			self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors])
2211			self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns])
2212
2213			self.msg(f'Computing repeatabilities for session {session}')
2214			self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session])
2215			self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session])
2216			self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session])
2217
2218		if self.standardization_method == 'pooled':
2219			for session in self.sessions:
2220
2221				self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}']
2222				i = self.standardization.var_names.index(f'a_{pf(session)}')
2223				self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5
2224
2225				self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}']
2226				i = self.standardization.var_names.index(f'b_{pf(session)}')
2227				self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5
2228
2229				self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}']
2230				i = self.standardization.var_names.index(f'c_{pf(session)}')
2231				self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5
2232
2233				self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}']
2234				if self.sessions[session]['scrambling_drift']:
2235					i = self.standardization.var_names.index(f'a2_{pf(session)}')
2236					self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5
2237				else:
2238					self.sessions[session]['SE_a2'] = 0.
2239
2240				self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}']
2241				if self.sessions[session]['slope_drift']:
2242					i = self.standardization.var_names.index(f'b2_{pf(session)}')
2243					self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5
2244				else:
2245					self.sessions[session]['SE_b2'] = 0.
2246
2247				self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}']
2248				if self.sessions[session]['wg_drift']:
2249					i = self.standardization.var_names.index(f'c2_{pf(session)}')
2250					self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5
2251				else:
2252					self.sessions[session]['SE_c2'] = 0.
2253
2254				i = self.standardization.var_names.index(f'a_{pf(session)}')
2255				j = self.standardization.var_names.index(f'b_{pf(session)}')
2256				k = self.standardization.var_names.index(f'c_{pf(session)}')
2257				CM = np.zeros((6,6))
2258				CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]]
2259				try:
2260					i2 = self.standardization.var_names.index(f'a2_{pf(session)}')
2261					CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]]
2262					CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2]
2263					try:
2264						j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2265						CM[3,4] = self.standardization.covar[i2,j2]
2266						CM[4,3] = self.standardization.covar[j2,i2]
2267					except ValueError:
2268						pass
2269					try:
2270						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2271						CM[3,5] = self.standardization.covar[i2,k2]
2272						CM[5,3] = self.standardization.covar[k2,i2]
2273					except ValueError:
2274						pass
2275				except ValueError:
2276					pass
2277				try:
2278					j2 = self.standardization.var_names.index(f'b2_{pf(session)}')
2279					CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]]
2280					CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2]
2281					try:
2282						k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2283						CM[4,5] = self.standardization.covar[j2,k2]
2284						CM[5,4] = self.standardization.covar[k2,j2]
2285					except ValueError:
2286						pass
2287				except ValueError:
2288					pass
2289				try:
2290					k2 = self.standardization.var_names.index(f'c2_{pf(session)}')
2291					CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]]
2292					CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2]
2293				except ValueError:
2294					pass
2295
2296				self.sessions[session]['CM'] = CM
2297
2298		elif self.standardization_method == 'indep_sessions':
2299			pass # Not implemented yet

Compute various statistics for each session.

  • Na: Number of anchor analyses in the session
  • Nu: Number of unknown analyses in the session
  • r_d13C_VPDB: δ13CVPDB repeatability of analyses within the session
  • r_d18O_VSMOW: δ18OVSMOW repeatability of analyses within the session
  • r_D47 or r_D48: Δ4x repeatability of analyses within the session
  • a: scrambling factor
  • b: compositional slope
  • c: WG offset
  • SE_a: Model stadard erorr of a
  • SE_b: Model stadard erorr of b
  • SE_c: Model stadard erorr of c
  • scrambling_drift (boolean): whether to allow a temporal drift in the scrambling factor (a)
  • slope_drift (boolean): whether to allow a temporal drift in the compositional slope (b)
  • wg_drift (boolean): whether to allow a temporal drift in the WG offset (c)
  • a2: scrambling factor drift
  • b2: compositional slope drift
  • c2: WG offset drift
  • Np: Number of standardization parameters to fit
  • CM: model covariance matrix for (a, b, c, a2, b2, c2)
  • d13Cwg_VPDB: δ13CVPDB of WG
  • d18Owg_VSMOW: δ18OVSMOW of WG
@make_verbal
def repeatabilities(self):
2302	@make_verbal
2303	def repeatabilities(self):
2304		'''
2305		Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x
2306		(for all samples, for anchors, and for unknowns).
2307		'''
2308		self.msg('Computing reproducibilities for all sessions')
2309
2310		self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors')
2311		self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors')
2312		self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors')
2313		self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns')
2314		self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples')

Compute analytical repeatabilities for δ13CVPDB, δ18OVSMOW, Δ4x (for all samples, for anchors, and for unknowns).

@make_verbal
def consolidate(self, tables=True, plots=True):
2317	@make_verbal
2318	def consolidate(self, tables = True, plots = True):
2319		'''
2320		Collect information about samples, sessions and repeatabilities.
2321		'''
2322		self.consolidate_samples()
2323		self.consolidate_sessions()
2324		self.repeatabilities()
2325
2326		if tables:
2327			self.summary()
2328			self.table_of_sessions()
2329			self.table_of_analyses()
2330			self.table_of_samples()
2331
2332		if plots:
2333			self.plot_sessions()

Collect information about samples, sessions and repeatabilities.

@make_verbal
def rmswd(self, samples='all samples', sessions='all sessions'):
2336	@make_verbal
2337	def rmswd(self,
2338		samples = 'all samples',
2339		sessions = 'all sessions',
2340		):
2341		'''
2342		Compute the χ2, root mean squared weighted deviation
2343		(i.e. reduced χ2), and corresponding degrees of freedom of the
2344		Δ4x values for samples in `samples` and sessions in `sessions`.
2345		
2346		Only used in `D4xdata.standardize()` with `method='indep_sessions'`.
2347		'''
2348		if samples == 'all samples':
2349			mysamples = [k for k in self.samples]
2350		elif samples == 'anchors':
2351			mysamples = [k for k in self.anchors]
2352		elif samples == 'unknowns':
2353			mysamples = [k for k in self.unknowns]
2354		else:
2355			mysamples = samples
2356
2357		if sessions == 'all sessions':
2358			sessions = [k for k in self.sessions]
2359
2360		chisq, Nf = 0, 0
2361		for sample in mysamples :
2362			G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2363			if len(G) > 1 :
2364				X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G])
2365				Nf += (len(G) - 1)
2366				chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G])
2367		r = (chisq / Nf)**.5 if Nf > 0 else 0
2368		self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.')
2369		return {'rmswd': r, 'chisq': chisq, 'Nf': Nf}

Compute the χ2, root mean squared weighted deviation (i.e. reduced χ2), and corresponding degrees of freedom of the Δ4x values for samples in samples and sessions in sessions.

Only used in D4xdata.standardize() with method='indep_sessions'.

@make_verbal
def compute_r(self, key, samples='all samples', sessions='all sessions'):
2372	@make_verbal
2373	def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'):
2374		'''
2375		Compute the repeatability of `[r[key] for r in self]`
2376		'''
2377
2378		if samples == 'all samples':
2379			mysamples = [k for k in self.samples]
2380		elif samples == 'anchors':
2381			mysamples = [k for k in self.anchors]
2382		elif samples == 'unknowns':
2383			mysamples = [k for k in self.unknowns]
2384		else:
2385			mysamples = samples
2386
2387		if sessions == 'all sessions':
2388			sessions = [k for k in self.sessions]
2389
2390		if key in ['D47', 'D48']:
2391			# Full disclosure: the definition of Nf is tricky/debatable
2392			G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions]
2393			chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum()
2394			Nf = len(G)
2395# 			print(f'len(G) = {Nf}')
2396			Nf -= len([s for s in mysamples if s in self.unknowns])
2397# 			print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider')
2398			for session in sessions:
2399				Np = len([
2400					_ for _ in self.standardization.params
2401					if (
2402						self.standardization.params[_].expr is not None
2403						and (
2404							(_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session))
2405							or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session))
2406							)
2407						)
2408					])
2409# 				print(f'session {session}: {Np} parameters to consider')
2410				Na = len({
2411					r['Sample'] for r in self.sessions[session]['data']
2412					if r['Sample'] in self.anchors and r['Sample'] in mysamples
2413					})
2414# 				print(f'session {session}: {Na} different anchors in that session')
2415				Nf -= min(Np, Na)
2416# 			print(f'Nf = {Nf}')
2417
2418# 			for sample in mysamples :
2419# 				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2420# 				if len(X) > 1 :
2421# 					chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ])
2422# 					if sample in self.unknowns:
2423# 						Nf += len(X) - 1
2424# 					else:
2425# 						Nf += len(X)
2426# 			if samples in ['anchors', 'all samples']:
2427# 				Nf -= sum([self.sessions[s]['Np'] for s in sessions])
2428			r = (chisq / Nf)**.5 if Nf > 0 else 0
2429
2430		else: # if key not in ['D47', 'D48']
2431			chisq, Nf = 0, 0
2432			for sample in mysamples :
2433				X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ]
2434				if len(X) > 1 :
2435					Nf += len(X) - 1
2436					chisq += np.sum([ (x-np.mean(X))**2 for x in X ])
2437			r = (chisq / Nf)**.5 if Nf > 0 else 0
2438
2439		self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.')
2440		return r

Compute the repeatability of [r[key] for r in self]

def sample_average(self, samples, weights='equal', normalize=True):
2442	def sample_average(self, samples, weights = 'equal', normalize = True):
2443		'''
2444		Weighted average Δ4x value of a group of samples, accounting for covariance.
2445
2446		Returns the weighed average Δ4x value and associated SE
2447		of a group of samples. Weights are equal by default. If `normalize` is
2448		true, `weights` will be rescaled so that their sum equals 1.
2449
2450		**Examples**
2451
2452		```python
2453		self.sample_average(['X','Y'], [1, 2])
2454		```
2455
2456		returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3,
2457		where Δ4x(X) and Δ4x(Y) are the average Δ4x
2458		values of samples X and Y, respectively.
2459
2460		```python
2461		self.sample_average(['X','Y'], [1, -1], normalize = False)
2462		```
2463
2464		returns the value and SE of the difference Δ4x(X) - Δ4x(Y).
2465		'''
2466		if weights == 'equal':
2467			weights = [1/len(samples)] * len(samples)
2468
2469		if normalize:
2470			s = sum(weights)
2471			if s:
2472				weights = [w/s for w in weights]
2473
2474		try:
2475# 			indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples]
2476# 			C = self.standardization.covar[indices,:][:,indices]
2477			C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples])
2478			X = [self.samples[sample][f'D{self._4x}'] for sample in samples]
2479			return correlated_sum(X, C, weights)
2480		except ValueError:
2481			return (0., 0.)

Weighted average Δ4x value of a group of samples, accounting for covariance.

Returns the weighed average Δ4x value and associated SE of a group of samples. Weights are equal by default. If normalize is true, weights will be rescaled so that their sum equals 1.

Examples

self.sample_average(['X','Y'], [1, 2])

returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3, where Δ4x(X) and Δ4x(Y) are the average Δ4x values of samples X and Y, respectively.

self.sample_average(['X','Y'], [1, -1], normalize = False)

returns the value and SE of the difference Δ4x(X) - Δ4x(Y).

def sample_D4x_covar(self, sample1, sample2=None):
2484	def sample_D4x_covar(self, sample1, sample2 = None):
2485		'''
2486		Covariance between Δ4x values of samples
2487
2488		Returns the error covariance between the average Δ4x values of two
2489		samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`),
2490		returns the Δ4x variance for that sample.
2491		'''
2492		if sample2 is None:
2493			sample2 = sample1
2494		if self.standardization_method == 'pooled':
2495			i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}')
2496			j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}')
2497			return self.standardization.covar[i, j]
2498		elif self.standardization_method == 'indep_sessions':
2499			if sample1 == sample2:
2500				return self.samples[sample1][f'SE_D{self._4x}']**2
2501			else:
2502				c = 0
2503				for session in self.sessions:
2504					sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1]
2505					sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2]
2506					if sdata1 and sdata2:
2507						a = self.sessions[session]['a']
2508						# !! TODO: CM below does not account for temporal changes in standardization parameters
2509						CM = self.sessions[session]['CM'][:3,:3]
2510						avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1])
2511						avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1])
2512						avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2])
2513						avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2])
2514						c += (
2515							self.unknowns[sample1][f'session_D{self._4x}'][session][2]
2516							* self.unknowns[sample2][f'session_D{self._4x}'][session][2]
2517							* np.array([[avg_D4x_1, avg_d4x_1, 1]])
2518							@ CM
2519							@ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T
2520							) / a**2
2521				return float(c)

Covariance between Δ4x values of samples

Returns the error covariance between the average Δ4x values of two samples. If if only sample_1 is specified, or if sample_1 == sample_2), returns the Δ4x variance for that sample.

def sample_D4x_correl(self, sample1, sample2=None):
2523	def sample_D4x_correl(self, sample1, sample2 = None):
2524		'''
2525		Correlation between Δ4x errors of samples
2526
2527		Returns the error correlation between the average Δ4x values of two samples.
2528		'''
2529		if sample2 is None or sample2 == sample1:
2530			return 1.
2531		return (
2532			self.sample_D4x_covar(sample1, sample2)
2533			/ self.unknowns[sample1][f'SE_D{self._4x}']
2534			/ self.unknowns[sample2][f'SE_D{self._4x}']
2535			)

Correlation between Δ4x errors of samples

Returns the error correlation between the average Δ4x values of two samples.

def plot_single_session( self, session, kw_plot_anchors={'ls': 'None', 'marker': 'x', 'mec': (0.75, 0, 0), 'mew': 0.75, 'ms': 4}, kw_plot_unknowns={'ls': 'None', 'marker': 'x', 'mec': (0, 0, 0.75), 'mew': 0.75, 'ms': 4}, kw_plot_anchor_avg={'ls': '-', 'marker': 'None', 'color': (0.75, 0, 0), 'lw': 0.75}, kw_plot_unknown_avg={'ls': '-', 'marker': 'None', 'color': (0, 0, 0.75), 'lw': 0.75}, kw_contour_error={'colors': [[0, 0, 0]], 'alpha': 0.5, 'linewidths': 0.75}, xylimits='free', x_label=None, y_label=None, error_contour_interval='auto', fig='new'):
2537	def plot_single_session(self,
2538		session,
2539		kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4),
2540		kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4),
2541		kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75),
2542		kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75),
2543		kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75),
2544		xylimits = 'free', # | 'constant'
2545		x_label = None,
2546		y_label = None,
2547		error_contour_interval = 'auto',
2548		fig = 'new',
2549		):
2550		'''
2551		Generate plot for a single session
2552		'''
2553		if x_label is None:
2554			x_label = f'δ$_{{{self._4x}}}$ (‰)'
2555		if y_label is None:
2556			y_label = f'Δ$_{{{self._4x}}}$ (‰)'
2557
2558		out = _SessionPlot()
2559		anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]]
2560		unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]]
2561		
2562		if fig == 'new':
2563			out.fig = ppl.figure(figsize = (6,6))
2564			ppl.subplots_adjust(.1,.1,.9,.9)
2565
2566		out.anchor_analyses, = ppl.plot(
2567			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2568			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors],
2569			**kw_plot_anchors)
2570		out.unknown_analyses, = ppl.plot(
2571			[r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2572			[r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns],
2573			**kw_plot_unknowns)
2574		out.anchor_avg = ppl.plot(
2575			np.array([ np.array([
2576				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2577				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2578				]) for sample in anchors]).T,
2579			np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T,
2580			**kw_plot_anchor_avg)
2581		out.unknown_avg = ppl.plot(
2582			np.array([ np.array([
2583				np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1,
2584				np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1
2585				]) for sample in unknowns]).T,
2586			np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T,
2587			**kw_plot_unknown_avg)
2588		if xylimits == 'constant':
2589			x = [r[f'd{self._4x}'] for r in self]
2590			y = [r[f'D{self._4x}'] for r in self]
2591			x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y)
2592			w, h = x2-x1, y2-y1
2593			x1 -= w/20
2594			x2 += w/20
2595			y1 -= h/20
2596			y2 += h/20
2597			ppl.axis([x1, x2, y1, y2])
2598		elif xylimits == 'free':
2599			x1, x2, y1, y2 = ppl.axis()
2600		else:
2601			x1, x2, y1, y2 = ppl.axis(xylimits)
2602				
2603		if error_contour_interval != 'none':
2604			xi, yi = np.linspace(x1, x2), np.linspace(y1, y2)
2605			XI,YI = np.meshgrid(xi, yi)
2606			SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi])
2607			if error_contour_interval == 'auto':
2608				rng = np.max(SI) - np.min(SI)
2609				if rng <= 0.01:
2610					cinterval = 0.001
2611				elif rng <= 0.03:
2612					cinterval = 0.004
2613				elif rng <= 0.1:
2614					cinterval = 0.01
2615				elif rng <= 0.3:
2616					cinterval = 0.03
2617				elif rng <= 1.:
2618					cinterval = 0.1
2619				else:
2620					cinterval = 0.5
2621			else:
2622				cinterval = error_contour_interval
2623
2624			cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval)
2625			out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error)
2626			out.clabel = ppl.clabel(out.contour)
2627
2628		ppl.xlabel(x_label)
2629		ppl.ylabel(y_label)
2630		ppl.title(session, weight = 'bold')
2631		ppl.grid(alpha = .2)
2632		out.ax = ppl.gca()		
2633
2634		return out

Generate plot for a single session

def plot_residuals( self, kde=False, hist=False, binwidth=0.6666666666666666, dir='output', filename=None, highlight=[], colors=None, figsize=None, dpi=100, yspan=None):
2636	def plot_residuals(
2637		self,
2638		kde = False,
2639		hist = False,
2640		binwidth = 2/3,
2641		dir = 'output',
2642		filename = None,
2643		highlight = [],
2644		colors = None,
2645		figsize = None,
2646		dpi = 100,
2647		yspan = None,
2648		):
2649		'''
2650		Plot residuals of each analysis as a function of time (actually, as a function of
2651		the order of analyses in the `D4xdata` object)
2652
2653		+ `kde`: whether to add a kernel density estimate of residuals
2654		+ `hist`: whether to add a histogram of residuals (incompatible with `kde`)
2655		+ `histbins`: specify bin edges for the histogram
2656		+ `dir`: the directory in which to save the plot
2657		+ `highlight`: a list of samples to highlight
2658		+ `colors`: a dict of `{<sample>: <color>}` for all samples
2659		+ `figsize`: (width, height) of figure
2660		+ `dpi`: resolution for PNG output
2661		+ `yspan`: factor controlling the range of y values shown in plot
2662		  (by default: `yspan = 1.5 if kde else 1.0`)
2663		'''
2664		
2665		from matplotlib import ticker
2666
2667		if yspan is None:
2668			if kde:
2669				yspan = 1.5
2670			else:
2671				yspan = 1.0
2672		
2673		# Layout
2674		fig = ppl.figure(figsize = (8,4) if figsize is None else figsize)
2675		if hist or kde:
2676			ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72)
2677			ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15)
2678		else:
2679			ppl.subplots_adjust(.08,.05,.78,.8)
2680			ax1 = ppl.subplot(111)
2681		
2682		# Colors
2683		N = len(self.anchors)
2684		if colors is None:
2685			if len(highlight) > 0:
2686				Nh = len(highlight)
2687				if Nh == 1:
2688					colors = {highlight[0]: (0,0,0)}
2689				elif Nh == 3:
2690					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])}
2691				elif Nh == 4:
2692					colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2693				else:
2694					colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)}
2695			else:
2696				if N == 3:
2697					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])}
2698				elif N == 4:
2699					colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])}
2700				else:
2701					colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)}
2702
2703		ppl.sca(ax1)
2704		
2705		ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75)
2706
2707		ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$'))
2708
2709		session = self[0]['Session']
2710		x1 = 0
2711# 		ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self])
2712		x_sessions = {}
2713		one_or_more_singlets = False
2714		one_or_more_multiplets = False
2715		multiplets = set()
2716		for k,r in enumerate(self):
2717			if r['Session'] != session:
2718				x2 = k-1
2719				x_sessions[session] = (x1+x2)/2
2720				ppl.axvline(k - 0.5, color = 'k', lw = .5)
2721				session = r['Session']
2722				x1 = k
2723			singlet = len(self.samples[r['Sample']]['data']) == 1
2724			if not singlet:
2725				multiplets.add(r['Sample'])
2726			if r['Sample'] in self.unknowns:
2727				if singlet:
2728					one_or_more_singlets = True
2729				else:
2730					one_or_more_multiplets = True
2731			kw = dict(
2732				marker = 'x' if singlet else '+',
2733				ms = 4 if singlet else 5,
2734				ls = 'None',
2735				mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0),
2736				mew = 1,
2737				alpha = 0.2 if singlet else 1,
2738				)
2739			if highlight and r['Sample'] not in highlight:
2740				kw['alpha'] = 0.2
2741			ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw)
2742		x2 = k
2743		x_sessions[session] = (x1+x2)/2
2744
2745		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1)
2746		ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1)
2747		if not (hist or kde):
2748			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center')
2749			ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f"   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center')
2750
2751		xmin, xmax, ymin, ymax = ppl.axis()
2752		if yspan != 1:
2753			ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2
2754		for s in x_sessions:
2755			ppl.text(
2756				x_sessions[s],
2757				ymax +1,
2758				s,
2759				va = 'bottom',
2760				**(
2761					dict(ha = 'center')
2762					if len(self.sessions[s]['data']) > (0.15 * len(self))
2763					else dict(ha = 'left', rotation = 45)
2764					)
2765				)
2766
2767		if hist or kde:
2768			ppl.sca(ax2)
2769
2770		for s in colors:
2771			kw['marker'] = '+'
2772			kw['ms'] = 5
2773			kw['mec'] = colors[s]
2774			kw['label'] = s
2775			kw['alpha'] = 1
2776			ppl.plot([], [], **kw)
2777
2778		kw['mec'] = (0,0,0)
2779
2780		if one_or_more_singlets:
2781			kw['marker'] = 'x'
2782			kw['ms'] = 4
2783			kw['alpha'] = .2
2784			kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other'
2785			ppl.plot([], [], **kw)
2786
2787		if one_or_more_multiplets:
2788			kw['marker'] = '+'
2789			kw['ms'] = 4
2790			kw['alpha'] = 1
2791			kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other'
2792			ppl.plot([], [], **kw)
2793
2794		if hist or kde:
2795			leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9)
2796		else:
2797			leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5)
2798		leg.set_zorder(-1000)
2799
2800		ppl.sca(ax1)
2801
2802		ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)')
2803		ppl.xticks([])
2804		ppl.axis([-1, len(self), None, None])
2805
2806		if hist or kde:
2807			ppl.sca(ax2)
2808			X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors])
2809
2810			if kde:
2811				from scipy.stats import gaussian_kde
2812				yi = np.linspace(ymin, ymax, 201)
2813				xi = gaussian_kde(X).evaluate(yi)
2814				ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1))
2815# 				ppl.plot(xi, yi, 'k-', lw = 1)
2816			elif hist:
2817				ppl.hist(
2818					X,
2819					orientation = 'horizontal',
2820					histtype = 'stepfilled',
2821					ec = [.4]*3,
2822					fc = [.25]*3,
2823					alpha = .25,
2824					bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)),
2825					)
2826			ppl.text(0, 0,
2827				f"   SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n   95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm",
2828				size = 7.5,
2829				alpha = 1,
2830				va = 'center',
2831				ha = 'left',
2832				)
2833
2834			ppl.axis([0, None, ymin, ymax])
2835			ppl.xticks([])
2836			ppl.yticks([])
2837# 			ax2.spines['left'].set_visible(False)
2838			ax2.spines['right'].set_visible(False)
2839			ax2.spines['top'].set_visible(False)
2840			ax2.spines['bottom'].set_visible(False)
2841
2842		ax1.axis([None, None, ymin, ymax])
2843
2844		if not os.path.exists(dir):
2845			os.makedirs(dir)
2846		if filename is None:
2847			return fig
2848		elif filename == '':
2849			filename = f'D{self._4x}_residuals.pdf'
2850		ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2851		ppl.close(fig)

Plot residuals of each analysis as a function of time (actually, as a function of the order of analyses in the D4xdata object)

  • kde: whether to add a kernel density estimate of residuals
  • hist: whether to add a histogram of residuals (incompatible with kde)
  • histbins: specify bin edges for the histogram
  • dir: the directory in which to save the plot
  • highlight: a list of samples to highlight
  • colors: a dict of {<sample>: <color>} for all samples
  • figsize: (width, height) of figure
  • dpi: resolution for PNG output
  • yspan: factor controlling the range of y values shown in plot (by default: yspan = 1.5 if kde else 1.0)
def simulate(self, *args, **kwargs):
2854	def simulate(self, *args, **kwargs):
2855		'''
2856		Legacy function with warning message pointing to `virtual_data()`
2857		'''
2858		raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()')

Legacy function with warning message pointing to virtual_data()

def plot_distribution_of_analyses( self, dir='output', filename=None, vs_time=False, figsize=(6, 4), subplots_adjust=(0.02, 0.13, 0.85, 0.8), output=None, dpi=100):
2860	def plot_distribution_of_analyses(
2861		self,
2862		dir = 'output',
2863		filename = None,
2864		vs_time = False,
2865		figsize = (6,4),
2866		subplots_adjust = (0.02, 0.13, 0.85, 0.8),
2867		output = None,
2868		dpi = 100,
2869		):
2870		'''
2871		Plot temporal distribution of all analyses in the data set.
2872		
2873		**Parameters**
2874
2875		+ `dir`: the directory in which to save the plot
2876		+ `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially.
2877		+ `dpi`: resolution for PNG output
2878		+ `figsize`: (width, height) of figure
2879		+ `dpi`: resolution for PNG output
2880		'''
2881
2882		asamples = [s for s in self.anchors]
2883		usamples = [s for s in self.unknowns]
2884		if output is None or output == 'fig':
2885			fig = ppl.figure(figsize = figsize)
2886			ppl.subplots_adjust(*subplots_adjust)
2887		Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2888		Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)])
2889		Xmax += (Xmax-Xmin)/40
2890		Xmin -= (Xmax-Xmin)/41
2891		for k, s in enumerate(asamples + usamples):
2892			if vs_time:
2893				X = [r['TimeTag'] for r in self if r['Sample'] == s]
2894			else:
2895				X = [x for x,r in enumerate(self) if r['Sample'] == s]
2896			Y = [-k for x in X]
2897			ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75)
2898			ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25)
2899			ppl.text(Xmax, -k, f'   {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r')
2900		ppl.axis([Xmin, Xmax, -k-1, 1])
2901		ppl.xlabel('\ntime')
2902		ppl.gca().annotate('',
2903			xy = (0.6, -0.02),
2904			xycoords = 'axes fraction',
2905			xytext = (.4, -0.02), 
2906            arrowprops = dict(arrowstyle = "->", color = 'k'),
2907            )
2908			
2909
2910		x2 = -1
2911		for session in self.sessions:
2912			x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2913			if vs_time:
2914				ppl.axvline(x1, color = 'k', lw = .75)
2915			if x2 > -1:
2916				if not vs_time:
2917					ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5)
2918			x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session])
2919# 			from xlrd import xldate_as_datetime
2920# 			print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0))
2921			if vs_time:
2922				ppl.axvline(x2, color = 'k', lw = .75)
2923				ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15)
2924			ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8)
2925
2926		ppl.xticks([])
2927		ppl.yticks([])
2928
2929		if output is None:
2930			if not os.path.exists(dir):
2931				os.makedirs(dir)
2932			if filename == None:
2933				filename = f'D{self._4x}_distribution_of_analyses.pdf'
2934			ppl.savefig(f'{dir}/{filename}', dpi = dpi)
2935			ppl.close(fig)
2936		elif output == 'ax':
2937			return ppl.gca()
2938		elif output == 'fig':
2939			return fig

Plot temporal distribution of all analyses in the data set.

Parameters

  • dir: the directory in which to save the plot
  • vs_time: if True, plot as a function of TimeTag rather than sequentially.
  • dpi: resolution for PNG output
  • figsize: (width, height) of figure
  • dpi: resolution for PNG output
def plot_bulk_compositions( self, samples=None, dir='output/bulk_compositions', figsize=(6, 6), subplots_adjust=(0.15, 0.12, 0.95, 0.92), show=False, sample_color=(0, 0.5, 1), analysis_color=(0.7, 0.7, 0.7), labeldist=0.3, radius=0.05):
2942	def plot_bulk_compositions(
2943		self,
2944		samples = None,
2945		dir = 'output/bulk_compositions',
2946		figsize = (6,6),
2947		subplots_adjust = (0.15, 0.12, 0.95, 0.92),
2948		show = False,
2949		sample_color = (0,.5,1),
2950		analysis_color = (.7,.7,.7),
2951		labeldist = 0.3,
2952		radius = 0.05,
2953		):
2954		'''
2955		Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses.
2956		
2957		By default, creates a directory `./output/bulk_compositions` where plots for
2958		each sample are saved. Another plot named `__all__.pdf` shows all analyses together.
2959		
2960		
2961		**Parameters**
2962
2963		+ `samples`: Only these samples are processed (by default: all samples).
2964		+ `dir`: where to save the plots
2965		+ `figsize`: (width, height) of figure
2966		+ `subplots_adjust`: passed to `subplots_adjust()`
2967		+ `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples,
2968		allowing for interactive visualization/exploration in (δ13C, δ18O) space.
2969		+ `sample_color`: color used for replicate markers/labels
2970		+ `analysis_color`: color used for sample markers/labels
2971		+ `labeldist`: distance (in inches) from replicate markers to replicate labels
2972		+ `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`.
2973		'''
2974
2975		from matplotlib.patches import Ellipse
2976
2977		if samples is None:
2978			samples = [_ for _ in self.samples]
2979
2980		saved = {}
2981
2982		for s in samples:
2983
2984			fig = ppl.figure(figsize = figsize)
2985			fig.subplots_adjust(*subplots_adjust)
2986			ax = ppl.subplot(111)
2987			ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
2988			ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
2989			ppl.title(s)
2990
2991
2992			XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']])
2993			UID = [_['UID'] for _ in self.samples[s]['data']]
2994			XY0 = XY.mean(0)
2995
2996			for xy in XY:
2997				ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color)
2998				
2999			ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color)
3000			ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color)
3001			ppl.text(*XY0, f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3002			saved[s] = [XY, XY0]
3003			
3004			x1, x2, y1, y2 = ppl.axis()
3005			x0, dx = (x1+x2)/2, (x2-x1)/2
3006			y0, dy = (y1+y2)/2, (y2-y1)/2
3007			dx, dy = [max(max(dx, dy), radius)]*2
3008
3009			ppl.axis([
3010				x0 - 1.2*dx,
3011				x0 + 1.2*dx,
3012				y0 - 1.2*dy,
3013				y0 + 1.2*dy,
3014				])			
3015
3016			XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0))
3017
3018			for xy, uid in zip(XY, UID):
3019
3020				xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy))
3021				vector_in_display_space = xy_in_display_space - XY0_in_display_space
3022
3023				if (vector_in_display_space**2).sum() > 0:
3024
3025					unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5
3026					label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist
3027					label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space
3028					label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space))
3029
3030					ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color)
3031
3032				else:
3033
3034					ppl.text(*xy, f'{uid}  ', va = 'center', ha = 'right', color = analysis_color)
3035
3036			if radius:
3037				ax.add_artist(Ellipse(
3038					xy = XY0,
3039					width = radius*2,
3040					height = radius*2,
3041					ls = (0, (2,2)),
3042					lw = .7,
3043					ec = analysis_color,
3044					fc = 'None',
3045					))
3046				ppl.text(
3047					XY0[0],
3048					XY0[1]-radius,
3049					f'\n± {radius*1e3:.0f} ppm',
3050					color = analysis_color,
3051					va = 'top',
3052					ha = 'center',
3053					linespacing = 0.4,
3054					size = 8,
3055					)
3056
3057			if not os.path.exists(dir):
3058				os.makedirs(dir)
3059			fig.savefig(f'{dir}/{s}.pdf')
3060			ppl.close(fig)
3061
3062		fig = ppl.figure(figsize = figsize)
3063		fig.subplots_adjust(*subplots_adjust)
3064		ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)')
3065		ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)')
3066
3067		for s in saved:
3068			for xy in saved[s][0]:
3069				ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color)
3070			ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color)
3071			ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color)
3072			ppl.text(*saved[s][1], f'  {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold')
3073
3074		x1, x2, y1, y2 = ppl.axis()
3075		ppl.axis([
3076			x1 - (x2-x1)/10,
3077			x2 + (x2-x1)/10,
3078			y1 - (y2-y1)/10,
3079			y2 + (y2-y1)/10,
3080			])			
3081
3082
3083		if not os.path.exists(dir):
3084			os.makedirs(dir)
3085		fig.savefig(f'{dir}/__all__.pdf')
3086		if show:
3087			ppl.show()
3088		ppl.close(fig)

Plot δ13C_VBDP vs δ18OVSMOW (of CO2) for all analyses.

By default, creates a directory ./output/bulk_compositions where plots for each sample are saved. Another plot named __all__.pdf shows all analyses together.

Parameters

  • samples: Only these samples are processed (by default: all samples).
  • dir: where to save the plots
  • figsize: (width, height) of figure
  • subplots_adjust: passed to subplots_adjust()
  • show: whether to call matplotlib.pyplot.show() on the plot with all samples, allowing for interactive visualization/exploration in (δ13C, δ18O) space.
  • sample_color: color used for replicate markers/labels
  • analysis_color: color used for sample markers/labels
  • labeldist: distance (in inches) from replicate markers to replicate labels
  • radius: radius of the dashed circle providing scale. No circle if radius = 0.
Inherited Members
builtins.list
clear
copy
append
insert
extend
pop
remove
index
count
reverse
sort
class D47data(D4xdata):
3130class D47data(D4xdata):
3131	'''
3132	Store and process data for a large set of Δ47 analyses,
3133	usually comprising more than one analytical session.
3134	'''
3135
3136	Nominal_D4x = {
3137		'ETH-1':   0.2052,
3138		'ETH-2':   0.2085,
3139		'ETH-3':   0.6132,
3140		'ETH-4':   0.4511,
3141		'IAEA-C1': 0.3018,
3142		'IAEA-C2': 0.6409,
3143		'MERCK':   0.5135,
3144		} # I-CDES (Bernasconi et al., 2021)
3145	'''
3146	Nominal Δ47 values assigned to the Δ47 anchor samples, used by
3147	`D47data.standardize()` to normalize unknown samples to an absolute Δ47
3148	reference frame.
3149
3150	By default equal to (after [Bernasconi et al. (2021)](https://doi.org/10.1029/2020GC009588)):
3151	```py
3152	{
3153		'ETH-1'   : 0.2052,
3154		'ETH-2'   : 0.2085,
3155		'ETH-3'   : 0.6132,
3156		'ETH-4'   : 0.4511,
3157		'IAEA-C1' : 0.3018,
3158		'IAEA-C2' : 0.6409,
3159		'MERCK'   : 0.5135,
3160	}
3161	```
3162	'''
3163
3164
3165	@property
3166	def Nominal_D47(self):
3167		return self.Nominal_D4x
3168	
3169
3170	@Nominal_D47.setter
3171	def Nominal_D47(self, new):
3172		self.Nominal_D4x = dict(**new)
3173		self.refresh()
3174
3175
3176	def __init__(self, l = [], **kwargs):
3177		'''
3178		**Parameters:** same as `D4xdata.__init__()`
3179		'''
3180		D4xdata.__init__(self, l = l, mass = '47', **kwargs)
3181
3182
3183	def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'):
3184		'''
3185		Find all samples for which `Teq` is specified, compute equilibrium Δ47
3186		value for that temperature, and add treat these samples as additional anchors.
3187
3188		**Parameters**
3189
3190		+ `fCo2eqD47`: Which CO2 equilibrium law to use
3191		(`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127);
3192		`wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)).
3193		+ `priority`: if `replace`: forget old anchors and only use the new ones;
3194		if `new`: keep pre-existing anchors but update them in case of conflict
3195		between old and new Δ47 values;
3196		if `old`: keep pre-existing anchors but preserve their original Δ47
3197		values in case of conflict.
3198		'''
3199		f = {
3200			'petersen': fCO2eqD47_Petersen,
3201			'wang': fCO2eqD47_Wang,
3202			}[fCo2eqD47]
3203		foo = {}
3204		for r in self:
3205			if 'Teq' in r:
3206				if r['Sample'] in foo:
3207					assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.'
3208				else:
3209					foo[r['Sample']] = f(r['Teq'])
3210			else:
3211					assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.'
3212
3213		if priority == 'replace':
3214			self.Nominal_D47 = {}
3215		for s in foo:
3216			if priority != 'old' or s not in self.Nominal_D47:
3217				self.Nominal_D47[s] = foo[s]
3218	
3219	def save_D47_correl(self, *args, **kwargs):
3220		return self._save_D4x_correl(*args, **kwargs)
3221
3222	save_D47_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D47')

Store and process data for a large set of Δ47 analyses, usually comprising more than one analytical session.

D47data(l=[], **kwargs)
3176	def __init__(self, l = [], **kwargs):
3177		'''
3178		**Parameters:** same as `D4xdata.__init__()`
3179		'''
3180		D4xdata.__init__(self, l = l, mass = '47', **kwargs)

Parameters: same as D4xdata.__init__()

Nominal_D4x = {'ETH-1': 0.2052, 'ETH-2': 0.2085, 'ETH-3': 0.6132, 'ETH-4': 0.4511, 'IAEA-C1': 0.3018, 'IAEA-C2': 0.6409, 'MERCK': 0.5135}

Nominal Δ47 values assigned to the Δ47 anchor samples, used by D47data.standardize() to normalize unknown samples to an absolute Δ47 reference frame.

By default equal to (after Bernasconi et al. (2021)):

{
        'ETH-1'   : 0.2052,
        'ETH-2'   : 0.2085,
        'ETH-3'   : 0.6132,
        'ETH-4'   : 0.4511,
        'IAEA-C1' : 0.3018,
        'IAEA-C2' : 0.6409,
        'MERCK'   : 0.5135,
}
def D47fromTeq(self, fCo2eqD47='petersen', priority='new'):
3183	def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'):
3184		'''
3185		Find all samples for which `Teq` is specified, compute equilibrium Δ47
3186		value for that temperature, and add treat these samples as additional anchors.
3187
3188		**Parameters**
3189
3190		+ `fCo2eqD47`: Which CO2 equilibrium law to use
3191		(`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127);
3192		`wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)).
3193		+ `priority`: if `replace`: forget old anchors and only use the new ones;
3194		if `new`: keep pre-existing anchors but update them in case of conflict
3195		between old and new Δ47 values;
3196		if `old`: keep pre-existing anchors but preserve their original Δ47
3197		values in case of conflict.
3198		'''
3199		f = {
3200			'petersen': fCO2eqD47_Petersen,
3201			'wang': fCO2eqD47_Wang,
3202			}[fCo2eqD47]
3203		foo = {}
3204		for r in self:
3205			if 'Teq' in r:
3206				if r['Sample'] in foo:
3207					assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.'
3208				else:
3209					foo[r['Sample']] = f(r['Teq'])
3210			else:
3211					assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.'
3212
3213		if priority == 'replace':
3214			self.Nominal_D47 = {}
3215		for s in foo:
3216			if priority != 'old' or s not in self.Nominal_D47:
3217				self.Nominal_D47[s] = foo[s]

Find all samples for which Teq is specified, compute equilibrium Δ47 value for that temperature, and add treat these samples as additional anchors.

Parameters

  • fCo2eqD47: Which CO2 equilibrium law to use (petersen: Petersen et al. (2019); wang: Wang et al. (2019)).
  • priority: if replace: forget old anchors and only use the new ones; if new: keep pre-existing anchors but update them in case of conflict between old and new Δ47 values; if old: keep pre-existing anchors but preserve their original Δ47 values in case of conflict.
def save_D47_correl(self, *args, **kwargs):
3219	def save_D47_correl(self, *args, **kwargs):
3220		return self._save_D4x_correl(*args, **kwargs)

Save D47 values along with their SE and correlation matrix.

Parameters

  • samples: Only these samples are output (by default: all samples).
  • dir: the directory in which to save the faile (by defaut: output)
  • filename: the name to the csv file to write to (by default: D47_correl.csv)
  • D47_precision: the precision to use when writing D47 and D47_SE values (by default: 4)
  • correl_precision: the precision to use when writing correlation factor values (by default: 4)
class D48data(D4xdata):
3225class D48data(D4xdata):
3226	'''
3227	Store and process data for a large set of Δ48 analyses,
3228	usually comprising more than one analytical session.
3229	'''
3230
3231	Nominal_D4x = {
3232		'ETH-1':  0.138,
3233		'ETH-2':  0.138,
3234		'ETH-3':  0.270,
3235		'ETH-4':  0.223,
3236		'GU-1':  -0.419,
3237		} # (Fiebig et al., 2019, 2021)
3238	'''
3239	Nominal Δ48 values assigned to the Δ48 anchor samples, used by
3240	`D48data.standardize()` to normalize unknown samples to an absolute Δ48
3241	reference frame.
3242
3243	By default equal to (after [Fiebig et al. (2019)](https://doi.org/10.1016/j.chemgeo.2019.05.019),
3244	[Fiebig et al. (2021)](https://doi.org/10.1016/j.gca.2021.07.012)):
3245
3246	```py
3247	{
3248		'ETH-1' :  0.138,
3249		'ETH-2' :  0.138,
3250		'ETH-3' :  0.270,
3251		'ETH-4' :  0.223,
3252		'GU-1'  : -0.419,
3253	}
3254	```
3255	'''
3256
3257
3258	@property
3259	def Nominal_D48(self):
3260		return self.Nominal_D4x
3261
3262	
3263	@Nominal_D48.setter
3264	def Nominal_D48(self, new):
3265		self.Nominal_D4x = dict(**new)
3266		self.refresh()
3267
3268
3269	def __init__(self, l = [], **kwargs):
3270		'''
3271		**Parameters:** same as `D4xdata.__init__()`
3272		'''
3273		D4xdata.__init__(self, l = l, mass = '48', **kwargs)
3274
3275	def save_D48_correl(self, *args, **kwargs):
3276		return self._save_D4x_correl(*args, **kwargs)
3277
3278	save_D48_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D48')

Store and process data for a large set of Δ48 analyses, usually comprising more than one analytical session.

D48data(l=[], **kwargs)
3269	def __init__(self, l = [], **kwargs):
3270		'''
3271		**Parameters:** same as `D4xdata.__init__()`
3272		'''
3273		D4xdata.__init__(self, l = l, mass = '48', **kwargs)

Parameters: same as D4xdata.__init__()

Nominal_D4x = {'ETH-1': 0.138, 'ETH-2': 0.138, 'ETH-3': 0.27, 'ETH-4': 0.223, 'GU-1': -0.419}

Nominal Δ48 values assigned to the Δ48 anchor samples, used by D48data.standardize() to normalize unknown samples to an absolute Δ48 reference frame.

By default equal to (after Fiebig et al. (2019), Fiebig et al. (2021)):

{
        'ETH-1' :  0.138,
        'ETH-2' :  0.138,
        'ETH-3' :  0.270,
        'ETH-4' :  0.223,
        'GU-1'  : -0.419,
}
def save_D48_correl(self, *args, **kwargs):
3275	def save_D48_correl(self, *args, **kwargs):
3276		return self._save_D4x_correl(*args, **kwargs)

Save D48 values along with their SE and correlation matrix.

Parameters

  • samples: Only these samples are output (by default: all samples).
  • dir: the directory in which to save the faile (by defaut: output)
  • filename: the name to the csv file to write to (by default: D48_correl.csv)
  • D48_precision: the precision to use when writing D48 and D48_SE values (by default: 4)
  • correl_precision: the precision to use when writing correlation factor values (by default: 4)
class D49data(D4xdata):
3281class D49data(D4xdata):
3282	'''
3283	Store and process data for a large set of Δ49 analyses,
3284	usually comprising more than one analytical session.
3285	'''
3286	
3287	Nominal_D4x = {"1000C": 0.0, "25C": 2.228}  # Wang 2004
3288	'''
3289	Nominal Δ49 values assigned to the Δ49 anchor samples, used by
3290	`D49data.standardize()` to normalize unknown samples to an absolute Δ49
3291	reference frame.
3292
3293	By default equal to (after [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)):
3294
3295	```py
3296	{
3297		"1000C": 0.0,
3298		"25C": 2.228
3299	}
3300	```
3301	'''
3302	
3303	@property
3304	def Nominal_D49(self):
3305		return self.Nominal_D4x
3306	
3307	@Nominal_D49.setter
3308	def Nominal_D49(self, new):
3309		self.Nominal_D4x = dict(**new)
3310		self.refresh()
3311	
3312	def __init__(self, l=[], **kwargs):
3313		'''
3314		**Parameters:** same as `D4xdata.__init__()`
3315		'''
3316		D4xdata.__init__(self, l=l, mass='49', **kwargs)
3317	
3318	def save_D49_correl(self, *args, **kwargs):
3319		return self._save_D4x_correl(*args, **kwargs)
3320	
3321	save_D49_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D49')

Store and process data for a large set of Δ49 analyses, usually comprising more than one analytical session.

D49data(l=[], **kwargs)
3312	def __init__(self, l=[], **kwargs):
3313		'''
3314		**Parameters:** same as `D4xdata.__init__()`
3315		'''
3316		D4xdata.__init__(self, l=l, mass='49', **kwargs)

Parameters: same as D4xdata.__init__()

Nominal_D4x = {'1000C': 0.0, '25C': 2.228}

Nominal Δ49 values assigned to the Δ49 anchor samples, used by D49data.standardize() to normalize unknown samples to an absolute Δ49 reference frame.

By default equal to (after Wang et al. (2004)):

{
        "1000C": 0.0,
        "25C": 2.228
}
def save_D49_correl(self, *args, **kwargs):
3318	def save_D49_correl(self, *args, **kwargs):
3319		return self._save_D4x_correl(*args, **kwargs)

Save D49 values along with their SE and correlation matrix.

Parameters

  • samples: Only these samples are output (by default: all samples).
  • dir: the directory in which to save the faile (by defaut: output)
  • filename: the name to the csv file to write to (by default: D49_correl.csv)
  • D49_precision: the precision to use when writing D49 and D49_SE values (by default: 4)
  • correl_precision: the precision to use when writing correlation factor values (by default: 4)