Skip to main content

Salinity Prediction via Specific Conductivity Data Processing: A Data Integration Study on Transitional Waters

Author list: Rita Saiu, Gianfranco Algieri

Published on: 31/03/2026 Document

1. Introduction.

The geological evolution and morphology of the Po Delta channels have been extensively researched.

Current drought prediction approaches for the Italian Po River are based on systems that exploit rainfall data, specifically the Water Exploitation Index Plus (WEI+), across the Alpine region and the Po Valley.  Throughout the preceding five centuries, the morphology of depositional areas has undergone fundamental modification as a result of sedimentation. A large number of studies have been conducted on the Po Delta and its sediments [Correggiari, Cattaneo, Trincardi, 2005], including a 1997 study on freshwater introduction and seasonal salinity fluctuations [Artegiani, Bregant, Paschini, Pinardi, Raicich, Russo, 1997]. However, the impacts of climate change on the physicochemical parameters at the river mouth remain poorly understood.

The focal geographical area of this study is the Sacca di Scardovari lagoon (Province of Rovigo, Veneto) which is adjacent to the Goro lagoon (Province of Ferrara, Emilia-Romagna). This area represents the largest lagoon in the Delta Po region and functions as a crucial hub for mussel and clam aquaculture [Pesca e Molluschicoltura nel Delta del Po: il Consorzio Cooperative Pescatori del Polesine]. The present study investigates two sites within the Sacca di Scardovari: Scardovari Interno (SI) and Scardovari Mare (SM). The selection of these locations was made on the basis of their geographical relationship, specifically due to the fact that the associated waters have a different distance from the sea. This geographical disparity permits a kinetic analysis of the processes of salinisation. 

It is an established fact that Italian Regional Environmental Protection Agencies (ARPAs) have, historically, collected extensive data on water quality. The datasets utilised in this study are indicative of seasonal and interannual fluctuations in temperature, conductivity and salinity. Indeed, salinity has been demonstrated to be a key indicator of desertification. 

To the best of the author's knowledge, this study represents the first attempt to implement predictive modelling based on conductivity and salinity time series within the Italian context. Although descriptive assessments of monitoring parameters have been documented, most notably in the report published by the Veneto Regional Authority in October 2013, predictive frameworks remain unexplored. 

The present study presents the findings of a longitudinal study conducted on approximately 271,000 records initiated in 2010. The core methodology is predicated on the longitudinal processing of measurements that have been gathered over a period of ten years of monitoring. The findings demonstrate the feasibility of predicting salinity trends – and, by extension, drought conditions – with a lead time of several hours.

2. Geographical Context.

The subsequent section provides a macro-regional and geographical overview of the site under examination. The Sacca di Scardovari is a lagoon situated at the mouth of the Po River, located within the Po Delta along the northwestern coast of the Adriatic Sea (44.950°N, 12.417°E). The lagoon constitutes an integral component of the Po Delta Regional Park, a designated nature reserve. The local economy is predominantly characterized by mussel aquaculture, which represents the primary productive activity of the area. The following section details the geographical contexts of the two case study data collection sites. The two localities are in close proximity (the distance is approximately 8 kilometres) and are flooded by interconnected water bodies.

Parco Delta del Po
Photo ESA
Photo Copernicus

Figure 1. Localization of the study area: Italy and Sacca di Scardovari in Parco del Delta del Po.

3. Materials and Methods

The fundamental architecture of a data processing pipeline generally comprises a sequential arrangement of discrete stages, designed to transform heterogeneous raw inputs into structured, actionable datasets. The fundamental structure of a data processing workflow can be illustrated as follows:

3.1 Data Acquisition

A comprehensive set of in-situ measurements of transitional waters in Po Delta Boe from 2010 to 2021 has been collated. The initial data processing was conducted in May 2022. At that time the datasets were last updated on 11th July 2022; however, the 2021 datasets were not available. The 2021 datasets were obtained from the Veneto Regional Agency for Environmental Prevention and Protection database (ARPAV).

The data was collected in Open Document Formats (ODS) and transformed into Comma-Separated Value (.csv) files. 

The robustness of the environmental characterisation is substantiated by the significant volume of historical datasets. Given an average size of 6MB per series, the data volume ensures statistical robustness and permit a representative analysis of the environmental dynamics at the Scardovari site.

3.2 Data Preprocessing

The process of data preprocessing has been especially time-consuming. The laboriousness of this workflow phase can be attributed to three primary factors. The absence of automated loading protocols in the early years was the first crucial factor identified. Notably, the records were found to have led to errors in the data entry process, necessitating systematic reconciliation procedures by ARPAV. Secondly, the presence of formatting discrepancies necessitated a rigorous standardisation process to align the datasets, thus ensuring data integrity. The analysis was further complicated by significant temporal data discontinuities. In order to maintain the statistical significance of the data, it was necessary to manage any gaps that occurred.

3.3 Software of data processing

All processing was done in Python, version 3.1, using mainly google Colab Pro and the Scikit-Learn library.

The creation of the graphs was facilitated by the utilisation of the free Datawrapper.de tool.

The Water Quality Prediction Dataset of the University of California – Irvine  has been selected for the training phase. The primary objective of this dataset is to predict the spatio-temporal water quality in terms of the pH values for the subsequent day. This is achieved by utilising historical data of water measurement indices. The input features, which consist of 11 common indices including volume of dissolved oxygen, temperature, and conductivity.

3.4 Data Transformation

The original datasets contain up to 24 daily measurements (i.e. one measurement per hour). The calculation of the daily mean was conducted for the purpose of data visualisation.

A comprehensive set of measurements was obtained for key physicochemical parameters, encompassing temperature, pH, conductivity, salinity, and dissolved oxygen. This was accomplished through the utilisation of a multiparameter probe, a sophisticated instrument that facilitates the acquisition of multiple parameters in a single measurement. Although occasional sampling occurred at depths of 0.5 m and 1.0 m, depth was not considered a primary variable in the data selection process. In instances, where redundant measurements were identified, the 1.0 m depth was systematically selected for inclusion in the final dataset, while surface-level data (0.5 m) were excluded.

Dissolved oxygen was excluded from the final analysis, as it fell outside the scope of the salinity characterisation study.

The 11 datasets from each locality were then consolidated into a single, unified dataset, thereby establishing a comprehensive historical series encompassing 11 years of data collection. The result was a comprehensive time-series encompassing a period of eleven years was constructed from the data obtained from each locality. 

schema

3.5 Selected Variables

The original datasets were reduced to a final number of four variables:

Selected variables Names their names in the original datasets Rename during processing
X1 Temperature Temperatura (dell’acqua) Temperatura
X2 pH pH pH
X3 Conductivity

Conducibilità Specifica a 25°C 

/ Conduttanza Specifica a 25°C

Conducibilita_Specifica_a_25C
y4 Salinity Salinità Salinita

It is important to note that electrical conductivity has been demonstrated as a reliable method for the characterisation of water [Rossum,1949]. However, the reproducibility of conductivity measurements is contingent on the salt species and their concentration, as well as the temperature [Blaine McCleskey, Nordstrom, Ryan, 2011]. Consequently, it is imperative to exercise meticulous caution when comparing saltwater samples to ensure that the dissolved salt species have been appropriately considered. This assumption is supported by the observation that both sampling sites are located within interconnected water bodies.  However, the conductivity nonlinear dependence on temperature necessitates the implementation of compensation for all measurements relative to the reference temperature of 25°C, as stipulated in ISO 7888:1985.

The following two correlation matrices are presented for examination

Scardovari  Interno pH Temperature

Conductivity

Salinity
pH 1.00 0.11

0.04

0.03

Temperature 0.11 1.00 -0.02 0.03

Conductivity

0.04

-0.02 1.00 0.99
Salinity

0.03

0.03 0.99 1.00
Scardovari  Mare pH Temperature

Conductivity

Salinity
pH 1.00

0.14

-0.04

-0.03

Temperature

0.14

1.00

0.13

0.18

Conductivity

-0.04

0.13

1.00 0.98
Salinity

-0.03

0.18 0.98 1.00

4. Case Study

The present dataset is utilised to optimise spatial dependence within a predictive framework. The analysis is predicated on the assumption of conditional independence, which is dictated by spatial contiguity and is consistent with the SADL-I Autoregressive Model [Zhao, Gkountouna, Pfoser, 2019]. This spatial regression approach posits a global interconnection between all elements; however, it specifically accounts for spatial autocorrelation, whereby proximal objects exhibit a higher degree of interaction compared to distant ones.

In the present model, the sampling sites – specifically Scardovari Interno (a) and Scardovari Mare (b) – are represented as nodes within a contiguity matrix, the properties of which are described herewith: 

a

b

a

0

1

b

1

0

in which is evident that the two nodes are not connected in a spatial sense.

The SADL-I model was selected on the basis of the assumption that it can effectively represent these two localities, which belong to the same catchment area but are characterised by a spatial gradient relative to their distance from the sea.

The estimation of expected future value was conducted via a structured tripartite methodology. Initially, discrete variables were extracted from the Scardovari Mare dataset, thus enabling univariate autoregressive modelling. Secondly, a diagnostic phase was undertaken in which each parameter underwent visual inspection via lag plots to characterise latent autocorrelation structures within the historical time series. 

Finally, the predictive performance of the autoregressive framework was validated by computing the Pearson correlation coefficient between the model-derived estimates and the empirical observations.

4.1 The visual presentation of information.

The salinity classes of the transitional waters from the two localities are represented herein:

Salinity classes of the transitional waters of the two localities Class boundaries Classes of salinity
Scardovari Interno

80% <30

20% >30

oligo-meso-polialini

eu-iperalini

Scardovari Mare

72% <30

28% >30

oligo-meso-polialini

eu-iperalini

Of the samples referenced by Scardovari Mare, approximately 72% exhibited a salinity value of less than 30, while 28% demonstrated a value greater than 30.  Of the samples referenced by Scardovari Interno, approximately 80% exhibited a salinity of less than 30, while the remaining 20% demonstrated a salinity greater than 30.

Historical daily average of salinity. A time-series analysis of salinity levels was performed on a robust dataset of approximately 271,000 records, covering an eleven-year monitoring horizon (2010–2021).

1.
2.

Data Visualization of pH measurements in the Scardovari Mare and Scardovari Interno

3.
4.

In order to establish whether the collected pH data demonstrate an alkalinisation phenomenon [Kaushala, Likensb, Paced, Utze, Haqa, Gormana, Gresea, 2018], particularly in relation to the Scardovari Interno locality, it is necessary to conduct a statistical investigation; the decision regarding the implementation of such an investigation is hereby reserved.

Data Visualization of Temperature (°C) measurements in the Scardovari Interno and Scardovari Mare

5.
6.

5. Data Processing

The analytical framework commences with rigorous data pre-processing and transformation protocols, aimed at facilitating the cross-validation of observations between the two distinct monitoring sites within the Sacca degli Scardovari. To this end, datasets were integrated into a Google Colaboratory environment, with particular emphasis placed on the standardisation of temporal variables (date-time fields) to ensure chronological synchronization and data integrity.

A linear regression model was developed using historical time-series data collected on a daily basis over an eleven-year period via the Scardovari Mare probe. The predictive framework utilised the independent variables of Temperatura, Conducibilita_Specifica_a_25C, and pH to estimate Salinita. In order to guarantee the efficacy of the model, the dataset was divided into two parts using a 75/25 split: 75% of the observations were allocated for the training phase, while the remaining 25% were reserved for hold-out evaluation and validation.

The following linear relationship, which is derived from the independent and dependent variables, is hereby expressed as an equation:

Equazione

This observation thus demonstrated the linear trend of the target variable, as illustrated in the subsequent scatter plot graph.

PLOT

In order to validate the reliability of the regression framework, the following performance indicators are utilised:

the square root of the standard deviation equal to 1.0349;

R^2 (coefficient of determination) - regression score function - equal to 0.9655. The maximum attainable score is 1.0. It is important to note that the score can assume a negative value, given the model's capacity to exhibit arbitrarily substandard performance. A constant model that consistently predicts the expected value of y, while disregarding input features, would result in an R² score of 0.0.

The majority of the dataset values were found to align with the straight line described by the linear equation derived from the linear regression model.

During the validation phase, the previously derived linear regression model was applied to the empirical observations from the Scardovari Interno probe. The objective of this cross-site validation process was twofold. Firstly, an attempt was made to assess the model's spatial generalizability. Secondly, it sought to determine whether the salinity dynamics at the interior station exhibited a trend synchronicity with those recorded at the Scardovari Mare site. The objective of the study was to verify the consistency of the environmental drivers influencing salinity across both lagoon localities by evaluating the model's predictive performance on this independent dataset.

The predictive residuals were found to be less than 0.86% of the full-scale range, indicating a high degree of model fidelity. In order to rigorously quantify the precision of the estimates, the model's performance was further evaluated using the following standardized statistical dispersion metrics: 

Metriche

The reliability of the algorithm model obtained was demonstrated through its application in the estimation of the salinity values of a given dataset containing measurements of temperature, conductivity, and pH by using a spatial correlation algorithm; and in the estimation of future values for all the quantities contained within the Scardovari Mare dataset by employing a linear autoregression as a means of detecting these values.

In order to predict future values, the following steps were taken: firstly, the variables for which to calculate future values with autoregression were extracted individually from the Scardovari Mare dataset; secondly, for each of them, a preliminary visual check was carried out to ascertain whether there was an autocorrelation in the historical series (lag plot); and finally, for each variable, the Pearson relationship coefficient was calculated between the series of values calculated by linear autoregression and the historical series. The calculation of the Pearson's relationship coefficient (the ratio of the covariance of two sets of values to the product of their standard deviations) is a fundamental step in this process. This is a number that summarises the relationship between two sets of values, ranging from -1 (where the values are negatively correlated) to +1 (where the values are positively correlated). The presence of low correlation is indicated by values approaching zero, whilst high correlation is indicated by values exceeding 0.5 or falling below -0.5.

The initial 31-day trial (comprising 744 measurements of a single variable) exhibited a significant correlation at the outset, but not throughout the entirety of the series.

The subsequent trial involved a 7-day period (168 measurements), a 3-day period (72 measurements), and a single day (24 measurements), generating Pearson coefficients ranging from 0.9658 to 0.9990.

The most robust predictive performance was observed in the 1-day salinity forecast, particularly when the analysis was restricted to a specific temporal window. As demonstrated in the forthcoming Figure, the model demonstrated optimal convergence for intraday observations recorded between 03:30 (observation no. 8) and 22:30 (observation no. 20). This constrained interval yielded the highest alignment between predicted and empirical values, underlining the model's efficacy in capturing diurnal salinity fluctuations.

The following two visual representations are provided for the purpose of facilitating a comparison between model-generated predictions and observational data on 1 January 2021 concerning specific conductivity (25°C) and salinity in Scardovari Mare.

Grafico condu
Grafico sali

Even though there was a limited number of variables available, and the linear and autoregressive algorithms had their inherent limitations, the salinity trends in the Scardovari Mare were successfully characterised.

The most reliable forecast was found in a limited time slot (03:30-22:30) outside of which the values predicted differed greatly from those detected.

The result obtained thus far can be regarded as a preliminary step towards the creation of an edge monitoring dashboard for the environmental features.

6. Conclusions

The primary distinction in the Thorslund and van Vliet study [Thorslund, van Vliet, 2020]  lies in the transition from data compilation to predictive methodology. In contrast to the literature cited above, the present study diverges by prioritising the methodology for predicting salinity. The derivation of quantitative transitional water analysis from dual time series constitutes an approach with significant, yet largely unexploited, potential for environmentally-oriented modelling.

The robustness of the proposed predictive framework was validated through a dual-stage validation process. Firstly, the model's applicability was established through the estimation of salinity levels within a dataset characterised by spatial correlation, with temperature, conductivity, and pH utilised as input predictors. Secondly, the longitudinal consistency of the Scardovari Mare record was utilised for time-series forecasting; specifically, a linear autoregressive model was implemented to extrapolate future trajectories across the full suite of monitored environmental variables.

This study marks a preliminary step towards the development of a real-time monitoring dashboard for the selected four variables, as well as a tool for forecasting the side salinity effects on transitional water at the Scardovari site. The 24-hour forecast proved most accurate, although its performance was temporally constrained (03:30-22:30), after which predictive accuracy declined. It proposes a preliminary framework for the implementation of edge-based environmental monitoring systems in coastal areas.

It constitutes a foundational phase in the engineering of real-time monitoring architectures for the four target environmental variables, providing a robust mechanism for forecasting salinity dynamics within the Scardovari lagoon. While the 24-hour predictive model demonstrated optimal accuracy, its performance was temporally circumscribed between 03:30 and 22:30, subsequent to which diminished predictive fidelity was demonstrated. This research provides a conceptual exploration of the potential for the application of edge-computing environmental monitoring systems in sensitive coastal ecosystems.

References

Artegiani, A., Bregant, D., Paschini, E., Pinardi, N., Raicich, F.,Russo, A., The Adriatic Sea general circulation: Part I. Air–sea interactions and water mass structure. J. Phys. Oceanogr. 1997, 27, 1492–1514

Artegiani, A., Bregant, D., Paschini, E., Pinardi, N., Raicich, F., Russo, A., The Adriatic Sea general circulation: Part II. Baroclinic circulation structure. J. Phys. Oceanogr. 1997, 27, 1515–1532

Blaine McCleskey B., Nordstrom K., Ryan J., Electrical conductivity method for natural waters. Applied Geochemistry. 2011, 26, S227–S229.  [CrossRef

Casadei S., Peppoloni F., Pierleoni A., A New Approach to Calculate the Water Exploitation Index (WEI+). Water 2020, 12, 3227. [CossRef]       

Correggiari A., Cattaneo A., Trincardi F., The modern Po Delta system: Lobe switching and asymmetric prodelta growth. Marine Geology 2005, 222-223: 49-74

Kaushala S., Likensb G., Paced M., Utze R., Haqa S., Gormana J., Gresea M., Freshwater salinization syndrome on a continental scale. PNAS. 2018, 115, E574–E583.  [CrossRef

Piano del Bilancio Idrico del Bacino del Fiume Po e Allegato 1 alla Relazione Generale. Autorità di Bacino del fiume Po (Ed.) 2016. [CrossRef]

Pesca e Molluschicoltura nel Delta del Po: il Consorzio Cooperative Pescatori del Polesine O.P. [CrossRef

Rossum, J.R., Conductance method for checking accuracy of water analyses. Anal. Chem. 1949, 21, 631

Thorslund J., van Vliet M., A global dataset of surface water and groundwater salinity measurements from 1980–2019Nature, 2020, 7:231. [CrossRef

Zhao L., Gkountouna O., Pfoser D., Spatial Auto-regressive Dependency Interpretable Learning Based on Spatial Topological Constraints. ACM Transactions on Spatial Algorithms and Systems. 2019, 5(3).  [CrossRef

DOCUMENTATION:

Python code

Dataset Scardovari Interno

Dataset Scardovari Mare

X_test_sviluppo

X_train_sviluppo

X_test_sviluppo and X_train_sviluppo are based on the work of Liang Zhao, Olga Gkountouna and Dieter Pfoser 'Spatial Auto-Regressive Dependency Interpretable Learning Based on Spatial Topological Constraints', ACM Transactions on the Web. ACM Trans. Spatial Algorithms Syst. , 5(3), Article 19 (August 2019). 

Available at: http://archive.ics.uci.edu/ml/datasets/water+quality+prediction-1

Categorisation

Type of document
Open source case study
Login or create an account to comment.