The Duct Retirement Project – An Analysis, Design and Coding
Case Study
Eli Willner
Introduction
I would like to describe one of the most complex and
challenging projects I worked on to date in my career, the “Duct Retirement
Project”.
This project was noteworthy in that it was one-of-a-kind –
the problem presented did not fit into any pre-existing molds and there was no
precedent to rely on in determining how to approach it. It involved extensive
use of artificial intelligence techniques in ways that were probably unique .
it required a great deal of data analysis and cleanup before the actual project
could begin – a process that itself required extensive use of artificial
intelligence. And it involved broad research into the undocumented practices of
a particular discipline over many decades – research that required numerous
office-based and field interviews of sometimes hostile information sources.
Although the client was a very large electrical utility,
politics dictated that this would be a one-person project, and I was the
person. That meant that after laying all the research and design groundwork I
would actually have to code the thing myself, test it, and prove that it
worked. I was allocated five years for the task. (I finished it, including the
writing of thousands of lines of code, in one year.)
The Problem
This electrical utility lays most of its cabling underground
and has been doing so for more than one hundred years. The cabling is encased
in ducts (pipes) and goes from point to point in a very complex grid that
defines the utility’s service area. Laying the ducts to accommodate the cables
requires extensive digging and is therefore very time-consuming and expensive.
Thus the utility generally lays more duct than is currently needed to avoid the
necessity of another costly digging operation in the future. (Laying cable does
not require digging; there are manhole access point that permit access to the
grid points and cable is laid through them, one grid node at a time.)
Predicting how much extra duct to lay involves projecting
the electrical needs of a particular neighborhood far into the future and is
therefore a very inexact science. The utility generally prefers erring on the
side of caution and frequently over-estimates the need. Thus many of the
underground ducts have never been used. Many have had cables at one time that
were later retired from service, when electrical needs declined. They too are
currently empty.
Now, the duct is laid under municipal property under
arrangement with various jurisdictions and a tax is levied on the utility based
on total duct footage. Thus the utility found itself in the unenviable position
of paying millions of dollars per year in taxes for ducts that were not in use
and would almost certainly never be used in the future. They appealed to the
jurisdictions for relief. After negotiation a settlement was reached: If the
utility could accurately identify the number and type of empty ducts on a
point-to-point basis, and would agree to permanently retire that duct from
service, they would be permitted to write it off their books and would be
exempt from paying tax on it.
Unfortunately the utility kept no record indicating which
ducts were empty and in fact had no record of which cables were in which ducts.
All they had were a primitive database of property records which indicated the
type of duct, the date it was laid and its endpoints in the grid. They had
similar records for each length of cable. Their best estimate for how many
ducts were empty was “lots”.
Doing the project “manually” by sending crews to perform a
visual check would have taken decades, during which taxes on the empty ducts
would have had to continue, and manpower costs would have made that process
prohibitively expensive. The only option was to develop an AI-based heuristic
to determine which ducts were empty on the basis of the information in the
primitive property records database. The assignment was given to the utility’s
internal IT group for scoping. They determined that the project was impossible
and declined to undertake it. But certain senior members of the management team were unwilling to accept that assessment
and obtained clearance to bring in an outside consultant to do an independent
assessment and a pilot. If the pilot was successful a full-scale project would
be launched.
I was that consultant.
The Solution
Assuring Data Integrity
The first task was to clean up the data. The quality of the
data was very poor; it was entered haphazardly, inconsistently and sometimes
not at all. Over the years, it was entered from illegible manual sheets filled
in by work crews. It was entered onto Hollerith punch cards using non-standard
“over-punches” to represent esoteric but undocumented details. Records were
80-byte fixed length binary streams with both fixed and variable length fields,
and with multiple record types with different fieldings. The database and
record structures were largely undocumented and would need to be decrypted in
order to have raw material upon which to build the analysis to determine which
ducts were empty.
Pattern matching forensics were employed to identify common
factors in the records with the unknown fields. Gradually the structure and
semantics of some of the fields were identified; the newly identified fields
provided additional clues to wring out the meanings of the remaining opaque
fields. This process was repeated until virtually all the ambiguities in the
data were resolved.
A virtual connectivity grid of ducts and cables were
constructed to compensate for missing data and to cross-check the validity of
existing data. For example if point A in the grid connected to point B which
connected to point C and then to point D, and a certain duct type went from A
to B and from C to D, it was virtually certain that there were no “duct gaps”
and the duct also went from B to C even though there was no record of that duct
arc in the database. The database was then augmented with the missing data.
Similarly if a cable of type N1 went from A to B and from C
to D, but from B to C there was a cable of type N2 and no cable of type N1 (and
there were no N2 cables between A and B or C and D), then it was virtually
certain that the N2 designation for the B to C arc was wrong and that cable was
really of type N1, to connect to the first and third arcs of cable type N1.
Methods like these, as well as other methods too detailed to
document here, were utilized to obtain a relatively clean database that could
then serve as the basis for the cable-to-duct mapping algorithms that would be
used to solve the problem.
The Approach
The single most complicating factor in this project was that
the cable-to-duct correspondence was not one-to-one. There were many varieties
of ducts and many varieties of cable. Most duct varieties could hold multiple
cables depending on cable type. Some cable combinations could not coexist in
the same duct. Moreover the number of cables per duct, for each duct type, was
a function of the variety of cable types traversing the same point-to-point
arc, as well as the number of other ducts available for cable in that arc. The
rules governing cable placement in duct were arcane and, again, largely
undocumented.
In addition to physical limitations installer custom played
a role. It was often left to the discretion of the cable installing crew to
decide which duct to use when installing a new cable. This decision was made on
the basis of custom – which varied over time and often over location – and
which was again, undocumented.
In order to definitively nail down the physical rules
governing cable placement a brute-force computer analysis of the cleansed
property record database was performed. Many permutations of possible
cable/duct combinations were generated and tested against the database; those
that were plausible in theory but never occurring were deemed impossible.
To create a set of rules describing installer custom a
further data analysis was performed. However the bulk of the information
necessary to create this rule set required very extensive interviewing,
focusing on senior and middle managers who started in the field, rose through
the ranks and, ideally, had many years of experience under their belts.
Interview subjects were drawn from the property records staff, from the
engineering staff and from the installation and maintenance staffs.
As mentioned previously not all interview candidates were
sympathetic to our efforts. In particular the property records staff, who were
the custodians of the poorly maintained database, felt that our efforts were
shining an unwelcome light on their shoddy quality of their work. They would
have preferred for us to simply go away, so they aligned themselves with the
utility’s IT staff who had previously deemed our project impossible and were
fervently hoping for us to fail in order to avoid looking very foolish. We
employed a great deal of finesse and diplomacy
to obtain everyone’s cooperation and as a rarely used last resort called
for the intervention of senior management to compel cooperation.
In the end we succeeded in building a very robust database
of rules documenting both physical and
custom based cable/duct combination possibilities.
Now the real work started. We created a time-lapse model of
the utility’s entire electrical grid, node to node. Every instance a cable was
installed we took a “time snapshot” of the duct/cable configuration of that
arc, as well as of contiguous arcs, following the larger path of the new cable
group’s length. We virtually envisioned what the installers would have seen
when they were laying that cable arc and its extensions in both directions. We
developed an intricate heuristic to test what would happen if the cable were
placed in each of the available ducts and developed a probability model to
determine where the cable likely ended up. We did this from “day 1” of the property
record file to current date, circling back in time and revising if necessary,
in the event future data made previous assumption unlikely or untenable.
In the end, based on our heuristic, we produced a detailed
report that showed which ducts contained each and every cable in the system –
and, most importantly, which ducts in the system were empty.
We selected a pilot group of representative system nodes and
senior management arranged for inspectors to do a physical comparison of the
reality against the predictions of our model in our pilot node set. Our
accuracy was better than 98%! Minor tweaks to our algorithms were made based on
feedback from the inspectors and the pilot process was repeated for another set
of pilot nodes, yielding a slightly higher accuracy level.
Management determined that that accuracy level we had
achieved was more than sufficient to impress the various jurisdictions and our
full report was submitted to them. They were given access to the system in
order to do their own physical checks and verified our accuracy claims. The
utility identified which of the empty ducts they were willing to retire and
they wrote those ducts off the books. Taxes were no longer paid on the retired
ducts and millions of dollars per year were saved.
The project was a success!