[Questions in bold for group discussion, which should be also addressed in written reports. Suggested answers are indicated in italics]
Accompanying this exercise is a file Nordic lake chemistry 1995.txt
containing water chemistry data from lakes in Finland, Sweden, and Norway, sampled in the 1995 Nordic lake survey (Henriksen et al. 1998 https://www.jstor.org/stable/4314692). First take a look at the data file by opening it with a text editor like Notepad. The first line contains the names of the columns in the data table, and after that the data follow. This means that the data file has a header and should be treated accordingly.
Start R and set the working directory to where you have put the file Nordic lake chemistry 1995.txt.
Review the syntax of the command for importing data (write ?read.table
in the console window). Start a new R script and compose an R statement that imports the data into a data.frame
object called N95
.
N95 <- read.table("Nordic lake chemistry 1995.txt", header=TRUE)
Use the command names(N95)
to see the variable names in the imported data. The first 3 variables are geographical locators, identifying the sampled lakes by country (Nation
), latitude (Lat
), and longitude (Lon
). The coordinates are un-projected decimal degrees in WGS84 datum (i.e. standard GPS coordinates). The next 10 variables are concentrations of different chemical macro-constituents in each lake. K25
is the specific conductivity at 25 \(^{\circ}\)C (mS / m) – a measure of total dissolved ionic substances. Ca
, Mg
, Na
, and K
are concentrations of the major base cations while Cl
, SO4
, and NO3
are concentrations of strong acid anions (What are the corresponding strong acids of these anions? hydrochloric, sulfuric, and nitric acid) The bicarbonate ion (HCO3
) is a weak acid anion which constitutes the major buffering system of aquatic systems.
All concentrations of major ions are reported as charge equivalents (\(\mu\)eq / L), which is defined as molar concentration multiplied by the absolute value of the ion’s valence (https://en.wikipedia.org/wiki/Equivalent_(chemistry)). Charge equivalents are particularly useful for making charge balance calculations as we will do later in this exercise.
If your variable names don’t match these you probably forgot to tell R that the file has a header. Use the dim
and summary
commands to get an overview of the size of the data set and the distributions of the individual parameters.
How many oberservations are there in the data set, and how are these distributed among the 3 participant countries? How many orders of magnitude do the ionic components vary? Can you see signs of skewness in the distributions just from the summary statistics? What would be the ionic compositions of the Nordic “median lake”? The data set has 4887 observations on 13 variables. There are >3 times as many observations from Sweden than from Norway and Finland, respectively (3019 vs. 872 / 996). Most ionic parameters vary at least 3 orders of magnitude. Means being up to twice as high as medians is a strong indication of right-skewedness. The “median lake” would have a pH close to neutral (6.7), and dominated by calcium and bicarbonate as major cat- and anions (148 and 122 \(\mu\)eq / L, respectively). Potassium and nitrate have the lowest medians (10 and 1.3 \(\mu\)eq / L), with sodium and magnesium (67 and 63 \(\mu\)eq / L), and sulfate and chloride (58 and 39 \(\mu\)eq / L) in intermediate positions.
dim(N95)
## [1] 4887 13
summary(N95)
## Nation Lat Lon pH
## Finland: 872 Min. : 0.00 Min. : 0.00 Min. :3.800
## Norway : 996 1st Qu.:59.59 1st Qu.:13.15 1st Qu.:6.340
## Sweden :3019 Median :61.93 Median :16.02 Median :6.740
## Mean :62.26 Mean :16.97 Mean :6.608
## 3rd Qu.:65.16 3rd Qu.:20.48 3rd Qu.:7.020
## Max. :71.06 Max. :31.37 Max. :8.780
## K25 Ca Mg Na
## Min. : 0.270 Min. : 1.0 Min. : 1.00 Min. : 3.0
## 1st Qu.: 2.090 1st Qu.: 77.0 1st Qu.: 35.00 1st Qu.: 45.0
## Median : 3.350 Median : 148.0 Median : 63.00 Median : 67.0
## Mean : 5.085 Mean : 269.6 Mean : 86.31 Mean : 118.5
## 3rd Qu.: 6.180 3rd Qu.: 290.0 3rd Qu.: 106.00 3rd Qu.: 151.0
## Max. :75.900 Max. :8121.0 Max. :1935.00 Max. :2222.0
## K HCO3 Cl SO4
## Min. : 0.00 Min. :-215 Min. : 3.00 Min. : 4.0
## 1st Qu.: 6.00 1st Qu.: 50 1st Qu.: 20.00 1st Qu.: 33.0
## Median : 10.00 Median : 122 Median : 39.00 Median : 58.0
## Mean : 16.18 Mean : 205 Mean : 99.66 Mean : 105.5
## 3rd Qu.: 19.00 3rd Qu.: 219 3rd Qu.: 134.50 3rd Qu.: 129.0
## Max. :335.00 Max. :4885 Max. :2353.00 Max. :5342.0
## NO3
## Min. : 0.00
## 1st Qu.: 0.43
## Median : 1.36
## Mean : 3.70
## 3rd Qu.: 4.00
## Max. :304.00
We can use boxplots of the individual ion concentrations to get a visual impression of their distributions. Notice that there are many variations of boxplots (https://en.wikipedia.org/wiki/Box_plot): the default in R is so-called Tukey boxplots (the actual algorithm is in the help file ?boxplot.stats
for some reason). We can make multiple boxplots with the command plot(x ~ g)
with x
being continuous and g
a factor variable.
Because R cannot make boxplots of data that are distributed over several columns, we first have to rearrange the table. What we need is a table where you find the ion names in one column and its respective value value in a second one. This is sometimes called transforming data from “wide” to “long” format. For this we can use the function stack
which creates a new data frame with only two columns. The first, called values
, combines all the columns of a data frame into one long column, while the second, called ind
, is a factor variable with as many levels as the number of columns in the original data frame.
We only want to stack the columns with ion concentrations so we first make a new data frame ion
with just columns 6 to 13 (N95[, 6:13]
) and then use the stack operation on that. In other words, plotting values ~ ind
for the stacked data frame will give boxplots for each of the 8 ions. Since the distributions are quite skewed, you might consider plotting log10(values) ~ ind
instead.
ion <- N95[, 6:13]
plot(values ~ ind, data=stack(ion))
plot(log10(values) ~ ind, data=stack(ion))
What do the boxplots tell us about the distributions of the major ions? Which ions are most and least abundant? What are the pros and cons of log-transforming these variables? We notice that all ions have strongly skewed distributions, with >75% (3rd quartile = upper edges of the boxes) being quite small compared to the remaining values. Log-transformation make all the distributions quite symmetrical, but we should worry about the “NaNs produced” warnings since these reflect that some variables (K
, HCO3
, NO3
) contain non-positive number
If we instead use the plot
function directly on the ion
data frame, we get a scatterplot matrix where each grid cell contains the bivariate distribution of a pair of ions. Since there are almost 5000 data points, we can make the plots much less cluttered by specifying the plotting symbols to the smallest possible dots (pch="."
) instead of the default circles.
Are there any pairs of ions that seem to be particularly closely related? Can you think of possible explanations for any such relationships? Two pairs of ions stand out as being particularly closely related: sodium (Na\(^+\)) and chloride (Cl\(^-\)), and calcium (Ca\(^{2+}\)) and bicarbonate (HCO\(_3^{-}\)). The former pair represents the major constituent of seawater salts (NaCl), while the latter is the main weathering product of calcite (CaCO\(_3\)) and other calcareous rocks and minerals
plot(ion, pch=".")
In this part we will investigate geographical visualizations of major ion distributions. There are many way to do this in R - here we will use a package called leaflet (http://rstudio.github.io/leaflet/) which has an easy interface for making nice maps that can be zoomed and scrolled. You need to install the leaflet package if it is not already available on your system. You can do this simply by copying the command install.packages("leaflet")
to your console window (you only need to do this once - or at least until the next time you upgrade R).
When we looked at the summary statistics of N95
you may have noticed that the minimum values of latitude (Lat
) and longitude (Lon
) are both zero, which is the coordinates of a point in the Gulf of Guinea, about 600 km south of the coast of Ghana. In other words, these values are unlikely to be coordinates lakes in northern Europe, and more likely to represent lakes with missing coordinate values. Use the sum
function on the logical expression N95\$Lat == 0
to find out how many records that have missing coordinates (notice the double equality symbols (==
) in logical identity tests). Only 9 lakes have missing coordinates so they will hopefully not matter much anyway
sum(N95$Lat == 0)
## [1] 9
Since we do not know the position of lakes with no coordinates, we need to exclude these lakes. We do this by using the subset
function to create a new table N95.1
containg the subset of records where both Lat > 0
and Lon > 0
.
N95.1 <- subset(N95, (Lat > 0) & (Lon > 0))
In order to use the leaflet package, we first need to load it by the command library(leaflet)
. The 2 most important leaflet commands for our purpose are addTiles
to create a map and addCircles
to put markers on the map. We can use the “pipeline” operator %>%
to feed the map created by addTiles
directly into addCircles
. The latter needs arguments defining the coordinates (lat=
, lng=
), and where those variables are located (data=
)
library(leaflet)
addTiles(leaflet()) %>%
addCircles(data = N95.1, lat = ~ Lat, lng = ~ Lon)
The resulting map has a circle marking the location of each lake in the study. Notice that we can pan out all the way to “where are we in the world?”, as well as zoom in and scroll through specific regions.
But we can also use size, shape, and color of the markers to represent other lake properties in addition to location, for example chloride concentration. We choose to color-code the markers with a yellow-green-blue gradient representing low to high chloride concentration. To make the colors of the markers more distinct we can also increase the value of the fillOpacity
argument from the default 0.2 to 0.5. We also add an additional pipeline step send the color-coded map to the addlegend
function so that we can see what the different colors mean.
Cl.pal <- colorNumeric(palette="YlGnBu", domain=log10(N95.1$Cl))
addTiles(leaflet()) %>%
addCircles(data=N95.1, lat= ~ Lat, lng= ~ Lon,
fillOpacity = 0.5, color= ~Cl.pal(log10(Cl))) %>%
addLegend("bottomright", data=N95.1, pal = Cl.pal, values = ~log10(Cl),
title = "log10(Cl)", opacity = 1)
Comment on the spatial patterns you see in the map? Discuss possible sources and mechanisms for whatever patterns you identify. *We see a strong coast-to-inland gradient perpendicular to the coastline of Norway, probably caused by sea-salt aerosols washed out by orographic precipitation (https://en.wikipedia.org/wiki/Orographic_lift) as oceanic air hits land. The elevated chloride concentrations in the lowland landscapes of southern Sweden and Finland are on the other hand probably due to salt-rich marine sediments deposited in high sea-level periods during the last post-glacial rebound (https://en.wikipedia.org/wiki/Post-glacial_rebound)*
Make similar maps for calcium, pH, and sulfate (not a good idea to log transform pH – why?). Try to explain the main factors determining the geographical distributions of these parameters.
Ca.pal <- colorNumeric(palette="RdYlBu", domain=log10(N95.1$Ca))
addTiles(leaflet()) %>%
addCircles(data=N95.1, lat= ~ Lat, lng= ~ Lon,
fillOpacity = 0.5, color= ~Ca.pal(log10(Ca))) %>%
addLegend("bottomright", data=N95.1, pal = Ca.pal, values = ~log10(Ca),
title = "log10(Ca)", opacity = 1)
Lakes in western and central Norway are particularly low in calcium due to shallow soils and calcium-poor pre-Cambrian shield bedrock. The strong correlation between calcium and bicarbonate means that low-calcium lakes have low buffering capacity, and thus high vulnerability to acidification
pH.pal <- colorNumeric(palette="RdYlBu", domain=N95.1$pH)
addTiles(leaflet()) %>%
addCircles(data=N95.1, lat= ~ Lat, lng= ~ Lon,
fillOpacity = 0.5, color= ~pH.pal(pH)) %>%
addLegend("bottomright", data=N95.1, pal = pH.pal, values = ~pH,
title = "pH", opacity = 1)
While lakes in the whole of western Norway have low calcium and low buffering capacity, it is lakes in southern Norway that have the lowest pH. Probably due to a closer proximity major sources of acid precipitation further south
SO4.pal <- colorNumeric(palette="YlOrRd", domain=log10(N95.1$SO4))
addTiles(leaflet()) %>%
addCircles(data=N95.1, lat= ~ Lat, lng= ~ Lon,
fillOpacity = 0.5, color= ~SO4.pal(log10(SO4))) %>%
addLegend("bottomright", data=N95.1, pal = SO4.pal, values = ~log10(SO4),
title = "log10(SO4)", opacity = 1)
Being a major constituent of sea-salt, sulfate shows a certain similarity to the chloride pattern, but there is also a north-south gradient that could be related to transport of acid rain from more industrialized areas in central Europe. Is there any way to separate the sea-salt contribution to sulfate deposition from other sources?
Sea water contains practically fixed proportions between major ions. We can use this together with the fact that practically all chloride in Nordic surface waters has a marine origin to calculate the marine contribution of other ions in freshwater samples. The charge equivalent contributions relative to chloride is as follows:
Ca | Mg | Na | K | HCO3 | Cl | SO4 | NO3 |
---|---|---|---|---|---|---|---|
0.037 | 0.193 | 0.852 | 0.018 | 0.004 | 1 | 0.103 | 0 |
To visualize the marine contribution we can for example plot Na
against Cl
and add the theoretical line with slope 0.852 as reference (using abline(0, 0.852)
). When there are a high number of data points like in this data set, it can often be useful to plot overlapping dots with a semi-transparent color: for example col = gray(0.2, 0.2)
will plot individual dots in dark gray with 80% transparency. Do this for Ca
, Mg
, Na
, K
, HCO3
, and SO4
, each with a red line representing the theoretical sea-water contribution. To see all 6 plots on one page you can use the command par(mfrow=c(2,3))
, which will give us 2 rows with 3 plots in each on the same page.
par(mfrow=c(2,3))
for (i in c(1:5, 7)) {
plot(ion[, i] ~ Cl, data=ion, ylab=names(ion)[i],
col = gray(0.2, 0.2), pch=19, cex=0.6)
abline(a=0, b=seawater.Cl[i], col="red")
}
Which lake water ions have major sea water contributions? And which have very little contribution? How do we interpret this? Calcium and bicarbonate have hardly any sea-salt contribution at all. This is partly because seawater concentrations of these ions are constrained by the solubility of CaCO\(_3\) in seawater, and also due to sea-salt contributions generally being small compared to local sources from weathering process in the watershed. The absence of dots below the seawater line means sea-salts from aerosols and marine soils are major sources of chloride in this landscape, such that the chloride concentration thus puts a lower bound on all the other ionic constituents. In addition to sodium, we see that potassium and magnesium concentrations in inland waters also can have substantial sea-salt contributions. Using chloride as proxy, we can subtract the theoretical sea-salt contribution to calculate what is often termed “non-marine sulfate”. Non-marine sources of sulfate include acid rain as well as microbial oxidation of sulfides in the local watershed
Weak acids have the property of of not being completely dissociated when dissolved in water. The degree of dissociation depends on the presence of other ions in the solution, such that it will decrease if we add strong acid, or increase if we add strong base. In other words; weak acids will act as buffers, thus diminishing the pH change resulting from addition of of strong acid or base. Carbonic acid (H\(_2\)CO\(_3\)) is the most abundant weak acid in both marine and inland waters. It dissociates in 2 steps to bicarbonate (HCO\(_3^-\)) and carbonate (CO\(_3^{2-}\)), each by losing one proton.
\[ H_2CO_3 \rightleftharpoons H^+ + HCO_3^- \rightleftharpoons 2H^+ + CO_3^{2-}\]
The bicarbonate system is thus the most important buffering system in aquatic systems. We can do all kinds of computations on the bicarbonate system using the aquaenv package for R (https://cran.r-project.org/web/packages/AquaEnv/vignettes/AquaEnv.pdf), which we can install directly from CRAN with install.packages("AquaEnv")
. We can then let AquaEnv solve the non-linear dissociation equations of the bicarbonate system as function of pH in a socalled Bjerrum plot:
library(AquaEnv)
bjerrum <- aquaenv(S=0, t=15, SumCO2=0.001, pH=seq(4, 11, 0.01))
with(bjerrum, {
plot(CO2 ~ pH, type="l", lwd=3)
lines(HCO3 ~ pH, col="red", lwd=3)
lines(CO3 ~ pH, col="blue", lwd=3)
})
Here we have set the sum of all carbonic acid species (SumCO2) equal 1 mM = 0.001 M and varied pH from 4 to 11. How do you explain the curves? We see that as we increase pH, we displace dissociation equilibria to the right by first by converting carbonic acid into bicarbonate, and then bicarbonate into carbonate.
All of the major ions can be measured fast and precisely with ion chromatography, except bicarbonate which is measured by titration which is both time-consuming and imprecise at low concentrations. As alternative, we can estimate bicarbonate as the difference between cat- and anions in the 7 remaining strong base or acid constituents. This difference, which is called Acid Neutralizing Capacity (ANC) is technically not just bicarbonate but the sum of all weak acids and bases in the solution.
We can estimate ANC very efficiently from our ion data by matrix arithmetic. Start with using the as.matrix
function to cast the data frame ion
into a matrix X
(although data frames and matrices are both rectangular tables, only matrices obey the rules of matrix arithmetic). Then we use the matrix
function to make a 8-element column vector z
with values [1, 1, 1, 1, 0, -1, -1 -1]’. Since X
has the same number of columns as z
has rows the dimensions in the matrix product X %*% z
are compatible and will result in a column vector where each cell is the sum of the 4 strong base cations (Ca, mg, Na, K) minus the sum of the 3 strong acid anions (Cl, SO4, NO3) for each lake in the data set - in other words, the quantity we call ANC:
X <- as.matrix(ion)
z <- matrix(c(1, 1, 1, 1, 0, -1, -1, -1), ncol=1)
ANC <- X %*% z
plot(ANC ~ HCO3, data=ion, col = gray(0.2, 0.2), pch=19)
abline(0, 1, col="red")
Plot ANC against measured HCO3. Are there any systematic deviations between the two? Discuss possible causes of such deviations. We see that with perhaps 2 exceptions, ANC is generally higher than HCO3. Since lakes need to be electrically neutral, this extra charge difference has to be from some other source than bicarbonate. These could be other, weak inorganic acids like boric (H3BO3) or orthosilisic (H4SiO4), but the most likely candidates are probably organic acids of fulvic or humic nature.
Dissolved ions can transport charge, which is why we can use the conductivity of a solution as a measure of the total concentration of ions in it. Conductivity is the inverse of electrical resistance (measured as Siemens, S = ohm\(^{-1}\)), and specific conductivity is the inverse resistance per unit distance across a water sample (mS m\(^{-1}\)). Different ions have different contributions to the specific conductivity, depending mainly on their charge and the diffusivity.
Since each ion has a well-defined molar conductance at a given temperature, and since individual ions contribute independently to the conductivity (at least at high dilution), we can calculate the specific conductivity contribution from each of the ions in a solution. The molar conductance \(\Lambda\) of a particular ion at a certain temperature can be predicted with the following equation (https://www.hydrochemistry.eu/exmpls/sc.html):
\[ \Lambda = {{z^2 F^2} \over {R T}} D \]
Where \(z\) is the charge of the ion (dimensionless), \(F\) is Faraday’s constant (96485.3 Coulomb mol\(^{-1}\)), \(R\) is the molar gas constant (8.314472 Joule mol\(^{-1}\) K\(^{-1}\)), \(T\) is absolute temperature (K), and D (m\(^2\) s\(^{-1}\)) is the molecular diffusion coefficient of the ion. To convince yourself that the unit of \(\Lambda\) will actually be S m\(^{-1}\) (mol m\(^{-3}\))\(^{-1}\), you need to recall that 1 Siemens = ohm\(^{-1}\) = 1 Ampere / Volt, that 1 Ampere = 1 Coulomb / s, and 1 Volt = 1 Joule / Coulomb. Notice also that S m\(^{-1}\) (mol m\(^{-3}\))\(^{-1}\) = mS m\(^{-1}\) (mmol m\(^{-3}\))\(^{-1}\) such that if concentrations are in units of mmol m\(^{-3}\) = \(\mu\)mol L\(^{-1}\) we get the theoretical conductivity in the same unit as reported in our data set (mS m\(^{-1}\)).
If \(m\) is the molar concentration of an ion with charge \(z\) then \(q = |z| m\) is the charge equivalent concentration of the same ion. Substituting \(q\) into the conductivity contribution of a given ion, we get
\[ \Lambda m = {{z^2 F^2} \over {R T}} D {q \over {|z|}} = \Bigg[ {{{|z| F^2} \over {R T}} D} \Bigg] q \]
Where the bracket term is charge equivalent conductance of the ion. In other words, we can calculate the theoretical specific conductivity by a matrix-vector multiplication, just as we did with ANC. The product X %*% y
computes sums of charge equivalents of ions weighted by the vector y
which contains their equivalent conductances, which again depend on both ion-specific valences and diffusion coefficients.
Diffusion coefficients for different ions can be found in the marelac
package (which you may need to install by install.packages("marelac"))
. Notice that we can also get fundamental constants like \(R\) and \(F\) from this package (check ?Constants
after loading the package). Notice also that the specific conductivity is highly temperature sensitive, and that the convention is to report the value for 25 \(^{\circ}\)C.
library(marelac)
## Loading required package: shape
F <- as.numeric(Constants$F[1]) # 96485.3 C / mol
R <- as.numeric(Constants$gasCt2[1]) # 8.314472 J / mol / K
t <- 25
T <- 273.15 + t # K
z <- c(2, 2, 1, 1, -1, -1, -2, -1)
D <- diffcoeff(S=0, t=t, species=c("Ca", "Mg","Na", "K", "HCO3", "Cl", "SO4", "NO3"))
y <- matrix(((abs(z) *F^2) / (R * T)) * unlist(D), ncol=1) # Recast to column vector
rownames(y) <- c("Ca", "Mg","Na", "K", "HCO3", "Cl", "SO4", "NO3")
knitr::kable(t(y))
Ca | Mg | Na | K | HCO3 | Cl | SO4 | NO3 |
---|---|---|---|---|---|---|---|
0.0060649 | 0.0052801 | 0.0050641 | 0.0050641 | 0.004482 | 0.0077173 | 0.0080215 | 0.0072103 |
Notice that diffcoeff
returns a list which we need to unlist
to turn it into a column vector. Pre-multiplying the equivalent conductance vector with the ionic charge equivalent concentration matrix should give us a calculated conductivity k25.calc
for each of the lake water samples. If we plot this against the measured conductivity (variable K25 from the N95
data frame), we should expects to see points falling on the 1:1 line (add this reference with abline(0,1)
).
N95$K25.calc <- X %*% y
plot(K25 ~ K25.calc, data=N95, pch=19, col=gray(0.2, 0.2))
abline(0, 1, col="red")
Are there systematic deviations between theoretical and measured conductivity? What could be the explanations for such deviations? There is an increasing deviation between measurements and theoretical predictions with increasing conductivities which is probably caused by activity coefficients deviating from unity as ionic strength increases. Remember that molar conductances are basically only valid at infinite dilution.
Focus on the lower left corner by repeating the same plot both xlim
and ylim
set to c(0, 10)
. Add a color coding to the markers such that low to high pH samples are colored from cyan to magenta: col= cm.colors(10)[cut(pH, 10)])
. The cut
function cuts a numerical variable into categorical subranges and returns a factor variable representing the subrange membership of each value. In combination with the cm.colors
function it gives each pH subrange a distinct on a gradient from cyan to magenta. What do you see?
plot(K25 ~ K25.calc, data=N95, pch=19, xlim=c(0, 10), ylim=c(0,10),
col=cm.colors(10)[cut(pH, 10)])
abline(0, 1, col="red")
Cyan dots are generally above the 1:1 line, especially in the most dilute waters. Could this be that there is something else contributing to the conductivity in the most acid lakes?
Could this have something to do with that we have not taken into account the conductivity contribution of H\(^+\) ions in our calculation? Use the same formula for equivalent conductance to calculate it for species="H"
and compare with the other ions.
D.H <- diffcoeff(S=0, t=t, species="H")
(y.H <- ((abs(1) * F^2) / (R * T)) * unlist(D.H))
## H
## 0.03502822
knitr::kable( y.H / t(y))
Ca | Mg | Na | K | HCO3 | Cl | SO4 | NO3 |
---|---|---|---|---|---|---|---|
5.775542 | 6.634068 | 6.916945 | 6.916945 | 7.815249 | 4.538929 | 4.366807 | 4.858073 |
In other words, since the equivalent of conductance of H\(^+\) is 4 to 8 times larger than the other major ions it can make a disproportionate contribution to conductivity, especially in waters with acid waters with low ionic strength.