Skip To Content
Created 6 months ago, updated 4 months ago

This data is being shared as part of the acoustic leak detection challenge. The data summary document provides a detailed explanation of the contents, but in summary the data set contains:

  • Acoustic logger data (columns A to J)
  • Data collated as part of current acoustic leak detection process (columns K to N)
  • Asset information and Geographic Information System (GIS) data associated with the logger’s location (columns O to AH)
  • Information about jobs raised nearby with the leakage team (columns AI to AS)
  • Leakage savings, where they could be estimated (columns AT to AU)

We have extracted the data from the systems described above and compiled it into this single data set.

As a guide, although all of this data is available for building your solution, only columns B to J and O to AH would be available as input to the model – for clarity these are shaded blue in the published dataset.

This file contains the following data:

  -Acoustic logger data (columns A to J)

  -Data collated as part of current acoustic leak detection process (columns K to N)

  -Asset information and Geographic Information System (GIS) data associated with the logger’s location (columns O to AH)

  -Information about jobs raised nearby with the leakage team (columns AI to AS)

  -Leakage savings, where they could be estimated (columns AT to AU)

A full explanation is provided below.

The updated dataset contains the following changes:

Fix for an issue with our aggregation tool that issued wrong data to columns O:T (pipeType, Diameter, Units, Date_Laid, Age_of_Pipe, Material).

Fix for an issue where Column AS (j100mDist) would pick up the job raised soonest rather than closest, and hence for example could display 99 meters even when the Jobs5m column was populated, which does not make sense. The distance has now been changed to the closest job raised.

In addition, certain data that is non-validated or not representative has been moved to separate sheets – details are provided on the relevant sheet.

This document describes each column in the Excel file.

The updated file contains an additional note on column AH, shown in italics.

This file contains the distance to a large water user for the rows of the original dataset. The column headed ‘lookup’ is common between this data request and the original published dataset.

This file contains similar categories of information to the original data but in this dataset the information relates to repair jobs identified in the original data.