Tutorial: water level & salinity data

This tutorial covers water level and salinity monitoring data

We will download the relevant data using this package (python-sa-gwdata) – importable as sa_gwdata – and use some other packages for other things:

  • matplotlib, numpy, pandas - used in the background

[1]:
import sa_gwdata

import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
# import contextily as cx
[8]:
plt.rcParams['figure.dpi'] = 100

Water level data

I’ll assume that you have already worked out which well you want to download data for - please see previous tutorials for how to find wells. We’re going to look at data from a monitoring well near Port Lincoln: 6028-536 (also known as obswell FLN029).

First we need to make sure we have correctly identified the well, and obtain its drillhole number:

[3]:
wells = sa_gwdata.find_wells("6028-536")
[4]:
wells
[4]:
['FLN029']
[5]:
df = sa_gwdata.water_levels(wells)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 562 entries, 0 to 561
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   dh_no            562 non-null    int64
 1   network          562 non-null    object
 2   unit_long        562 non-null    int64
 3   aquifer          562 non-null    object
 4   easting          562 non-null    float64
 5   northing         562 non-null    float64
 6   zone             562 non-null    int64
 7   unit_hyphen      562 non-null    object
 8   obs_no           562 non-null    object
 9   obs_date         562 non-null    datetime64[ns]
 10  dtw              561 non-null    float64
 11  swl              561 non-null    float64
 12  rswl             561 non-null    float64
 13  pressure         0 non-null      float64
 14  temperature      0 non-null      float64
 15  dry_ind          0 non-null      float64
 16  anomalous_ind    562 non-null    object
 17  pump_ind         562 non-null    object
 18  measured_during  562 non-null    object
 19  data_source      562 non-null    object
 20  comments         4 non-null      object
dtypes: datetime64[ns](1), float64(8), int64(3), object(9)
memory usage: 92.3+ KB
[6]:
df[['obs_date', 'dtw', 'swl', 'rswl', 'anomalous_ind', 'comments']]
[6]:
obs_date dtw swl rswl anomalous_ind comments
0 1959-02-26 10.67 10.67 1.67 N NaN
1 1960-02-11 10.76 10.76 1.58 N NaN
2 1964-11-04 10.70 10.70 1.64 N NaN
3 1965-10-06 10.96 10.96 1.38 N NaN
4 1966-01-06 11.01 11.01 1.33 N NaN
... ... ... ... ... ... ...
557 2020-10-11 12.80 12.22 0.12 N NaN
558 2021-03-20 12.89 12.31 0.03 N NaN
559 2021-10-26 12.87 12.29 0.05 N NaN
560 2022-04-11 12.96 12.38 -0.04 N NaN
561 2022-10-07 12.90 12.32 0.02 N NaN

562 rows × 6 columns

[10]:
fig = plt.figure()
ax = fig.add_subplot(111)

ax.plot(df.obs_date, df.swl, marker='x')
ax.invert_yaxis()
_images/430-tutorial-water-monitoring-data_9_0.png

Anomalous data points

Note the three outliers. These have actually been flagged as anomalous:

[11]:
df[df.obs_date.dt.year == 2014][['obs_date', 'dtw', 'swl', 'rswl', 'anomalous_ind', 'comments']]
[11]:
obs_date dtw swl rswl anomalous_ind comments
493 2014-01-13 12.61 12.03 0.31 N NaN
494 2014-02-10 12.64 12.06 0.28 N NaN
495 2014-03-11 12.67 12.09 0.25 N NaN
496 2014-04-09 12.68 12.10 0.24 N NaN
497 2014-05-04 12.70 12.12 0.22 N NaN
498 2014-05-05 12.70 12.12 0.22 N NaN
499 2014-06-04 12.71 12.13 0.21 N NaN
500 2014-07-28 12.61 12.03 0.31 N NaN
501 2014-08-25 12.60 12.02 0.32 N NaN
502 2014-09-22 16.58 16.00 -3.66 Y Transcription error
503 2014-10-22 12.59 12.01 0.33 N NaN
504 2014-11-17 12.59 12.01 0.33 N NaN
505 2014-12-15 12.59 12.01 0.33 N NaN

And if you want, you can opt to always exclude “anomalous” readings, although it is should be noted the definition of “anomalous” is somewhat subjective:

[12]:
df = sa_gwdata.water_levels(wells, anomalous=False)
df[df.obs_date.dt.year == 2014][['obs_date', 'dtw', 'swl', 'rswl', 'anomalous_ind', 'comments']]
[12]:
obs_date dtw swl rswl anomalous_ind comments
493 2014-01-13 12.61 12.03 0.31 N NaN
494 2014-02-10 12.64 12.06 0.28 N NaN
495 2014-03-11 12.67 12.09 0.25 N NaN
496 2014-04-09 12.68 12.10 0.24 N NaN
497 2014-05-04 12.70 12.12 0.22 N NaN
498 2014-05-05 12.70 12.12 0.22 N NaN
499 2014-06-04 12.71 12.13 0.21 N NaN
500 2014-07-28 12.61 12.03 0.31 N NaN
501 2014-08-25 12.60 12.02 0.32 N NaN
502 2014-10-22 12.59 12.01 0.33 N NaN
503 2014-11-17 12.59 12.01 0.33 N NaN
504 2014-12-15 12.59 12.01 0.33 N NaN

Groundwater level parameters

There are three different water level parameters:

Depth to Water (dtw/DTW)

This is in metres measured below a reference point. Increasing numbers indicate an increasing depth to water. Negative numbers indicate flowing artesian conditions. The reference point’s true elevation above ground surface can change i.e. if a casing standpipe is chopped off or installed, so although this number is what is in effect measured in the real world, it should not be used for analysis.

Standing Water Level (swl/SWL)

This is the Depth to Water, automatically corrected such that it represents a depth below ground level. Increasing numbers indicate an increasing depth to water. Negative numbers indicate flowing artesian conditions. The SWL and DTW values are only different when well elevation data is available and the two elevation values (of the reference point, and the ground) are different - the difference is used to calculate the SWL. If they are the same (e.g. maybe the elevation was derived from a digital elevation model), or no elevation data is available, then the SWL will be present but it will be identical to the DTW. I recommend that you always use the SWL value (or RSWL, see below).

Reduced Standing Water Level (rswl/RWL)

This value has been corrected to represent the groundwater level measured above Australian Height Datum (AHD). Increasing numbers indicate a decreasing depth to water. RSWL values are only present when we have an elevation survey for that well. This is obviously the correct value to use for most purposes, if it is available.

Note that there is no distinguishing between physical groundwater levels (i.e. the water table) and water levels as measured in a well which penetrates and provides access to a confined aquifer (i.e. a groundwater pressure level). Even wells which access confined groundwater with flowing artesian conditions, where the pressure level is measured using a pressure gauge at surface, have data reported using this method as “Depth to Water” (which would in that case be a negative number).

To compare all of these parameters, see them on a chart:

[14]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(df.obs_date, df.dtw, label='dtw', color='tab:blue')
ax.plot(df.obs_date, df.swl, label='swl', color='tab:green', lw=12, alpha=0.3)
ax.plot([], [], label='rswl', color='tab:red', alpha=1, lw=2, ls=':')
ax.invert_yaxis()
span = 2.5
ax_top = 10.6
ax2_top = 1.75
ax.set_ylim(ax_top + span, ax_top)
ax.set_ylabel('DTW & SWL', color='black')
ax2 = ax.twinx()
ax2.plot(df.obs_date, df.rswl, label='rswl', color='tab:red', alpha=1, lw=2, ls=':')
ax2.set_ylim(ax2_top - span, ax2_top)
ax2.set_ylabel('RSWL', color='red')
ax.legend()
fig.savefig("figures/groundwater-level-parameters.png", bbox_inches='tight')
_images/430-tutorial-water-monitoring-data_16_0.png

Note the change in casing reference point around 1990 - clearly a casing standpipe was installed, because overnight, the depth to water increased by about one metre, and the new measurements were being made from the top of the standpipe, instead of ground level.

Measured during!

There is a metadata property attached to water level and salinity data called “measured during”. This indicates what kind of activity the data was obtained during, and it is a good way to filter out irrelevant measurements from relevant ones in terms of monitoring.

These are the values that measured_during can take:

[15]:
meas_during_lut = {
    'A': 'Aquifer Test',
    'D': 'Drilling',
    'F': 'Field Survey',
    'S': 'Final Sample on drilling completion',
    'G': 'Geophysical Logging',
    'L': 'Landowner Sample',
    'M': 'Monitoring',
    'R': 'Rehabilitation',
    'U': 'Unknown',
    'W': 'Well Yield',
}

An example is from well 6628-5103

[16]:
wells = sa_gwdata.find_wells('6628-5103')
[17]:
df = sa_gwdata.water_levels(wells)
[18]:
df.measured_during.value_counts()
[18]:
M    142
A      3
Name: measured_during, dtype: int64
[19]:
fig = plt.figure()
ax = fig.add_subplot(111)
df[df.measured_during == 'M'].plot(ax=ax, x='obs_date', y='swl', color='tab:blue', marker='+', lw=0.5, label='Monitoring')
df[df.measured_during == 'A'].plot(ax=ax, x='obs_date', y='swl', color='tab:red', marker='x', lw=0.5, label='Aquifer test')
df.plot(ax=ax, x='obs_date', y='swl', lw=0.1, color='grey', label='all data')
ax.legend()
ax.invert_yaxis()
ax.set_ylabel('swl')
[19]:
Text(0, 0.5, 'swl')
_images/430-tutorial-water-monitoring-data_24_1.png

As you can see, filtering to only retain measured during “M” (monitoring) is often a good idea for analysis of water level data.

Salinity data

Salinity tends to be monitored from different wells than water level.

Let’s look at LKW039, a well from the other side of Eyre Peninsula at Coffin Bay.

[20]:
wells = sa_gwdata.find_wells("LKW 39")
[21]:
wells
[21]:
['LKW039']
[22]:
df = sa_gwdata.salinities(wells)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 168 entries, 0 to 167
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   dh_no            168 non-null    int64
 1   network          168 non-null    object
 2   aquifer          168 non-null    object
 3   unit_hyphen      168 non-null    object
 4   unit_long        168 non-null    int64
 5   obs_no           168 non-null    object
 6   collected_date   168 non-null    datetime64[ns]
 7   collected_time   7 non-null      object
 8   tds              168 non-null    int64
 9   ec               168 non-null    int64
 10  ph               20 non-null     float64
 11  sample_type      168 non-null    object
 12  anomalous_ind    168 non-null    object
 13  test_place       86 non-null     object
 14  extract_method   165 non-null    object
 15  measured_during  168 non-null    object
 16  data_source      168 non-null    object
 17  easting          168 non-null    float64
 18  northing         168 non-null    float64
 19  zone             168 non-null    int64
dtypes: datetime64[ns](1), float64(3), int64(5), object(11)
memory usage: 26.4+ KB

These are the important fields:

  • TDS (Total dissolved solids in mg/L): almost always estimated from the EC measurement by a standard formula - TODO document

  • EC (Electrical conductivity at 25 deg C in uS/cm): the electrical conductivity, corrected for temperature

  • Anomalous indicator Y/N: as for water level above

  • Test place: location that the EC was measured: values include:

    • U = Unknown

    • F = During fieldwork

    • RP = Regency Park i.e. the water test room

  • Extraction method: how the water sample was obtained for testing.

  • Measured during: see discussion above

  • Data source: where the data came from

[23]:
df[['collected_date', 'tds', 'ec', 'anomalous_ind', 'test_place', 'extract_method', 'measured_during', 'data_source']]
[23]:
collected_date tds ec anomalous_ind test_place extract_method measured_during data_source
0 1985-03-25 392 712 N U PUMP W DEWNR
1 1985-03-25 392 712 N U PUMP W DEWNR
2 1985-03-25 393 714 N U PUMP W DEWNR
3 1985-03-25 391 710 N U PUMP W DEWNR
4 1985-03-25 397 722 N U PUMP W DEWNR
... ... ... ... ... ... ... ... ...
163 2015-04-20 1049 1900 N NaN PUMP M DEWNR
164 2016-04-18 1077 1950 N F PUMP M DEWNR
165 2019-12-14 1127 2040 N RP PUMP M DEW
166 2020-10-18 1076 1948 N RP PUMP M DEW
167 2021-10-26 1116 2020 N RP PUMP M DEW

168 rows × 8 columns

A key field for salinity data is certainly the extraction method, and usually values like PUMP and AIRL (airlifting) are the main ones you should filter for:

[24]:
df.extract_method.fillna("NA").value_counts()
[24]:
BAIL    134
PUMP     31
NA        3
Name: extract_method, dtype: int64
[25]:
fig = plt.figure()
ax = fig.add_subplot(111)
df[df.extract_method == 'PUMP'].plot(
    ax=ax, x='collected_date', y='tds', color='tab:blue', marker='s', ms=10, ls='none', label='PUMP', mfc='skyblue'
)
df[df.extract_method == 'BAIL'].plot(
    ax=ax, x='collected_date', y='tds', color='tab:orange', marker='^', ms=10, ls='none', label='BAIL', mfc='none'
)
df[pd.isnull(df.extract_method)].plot(
    ax=ax, x='collected_date', y='tds', color='black', marker='*', ms=15, ls='none', label='null', mfc='lightgrey'
)
ax.legend()
ax.set_ylabel('tds')
[25]:
Text(0, 0.5, 'tds')
_images/430-tutorial-water-monitoring-data_34_1.png
[ ]: