Thursday, April 7, 2016

The Duct Retirement Project – An Analysis, Design and Coding Case Study

The Duct Retirement Project – An Analysis, Design and Coding Case Study

Eli Willner

Introduction

I would like to describe one of the most complex and challenging projects I worked on to date in my career, the “Duct Retirement Project”.
This project was noteworthy in that it was one-of-a-kind – the problem presented did not fit into any pre-existing molds and there was no precedent to rely on in determining how to approach it. It involved extensive use of artificial intelligence techniques in ways that were probably unique . it required a great deal of data analysis and cleanup before the actual project could begin – a process that itself required extensive use of artificial intelligence. And it involved broad research into the undocumented practices of a particular discipline over many decades – research that required numerous office-based and field interviews of sometimes hostile information sources.
Although the client was a very large electrical utility, politics dictated that this would be a one-person project, and I was the person. That meant that after laying all the research and design groundwork I would actually have to code the thing myself, test it, and prove that it worked. I was allocated five years for the task. (I finished it, including the writing of thousands of lines of code, in one year.)

The Problem

This electrical utility lays most of its cabling underground and has been doing so for more than one hundred years. The cabling is encased in ducts (pipes) and goes from point to point in a very complex grid that defines the utility’s service area. Laying the ducts to accommodate the cables requires extensive digging and is therefore very time-consuming and expensive. Thus the utility generally lays more duct than is currently needed to avoid the necessity of another costly digging operation in the future. (Laying cable does not require digging; there are manhole access point that permit access to the grid points and cable is laid through them, one grid node at a time.)
Predicting how much extra duct to lay involves projecting the electrical needs of a particular neighborhood far into the future and is therefore a very inexact science. The utility generally prefers erring on the side of caution and frequently over-estimates the need. Thus many of the underground ducts have never been used. Many have had cables at one time that were later retired from service, when electrical needs declined. They too are currently empty.
Now, the duct is laid under municipal property under arrangement with various jurisdictions and a tax is levied on the utility based on total duct footage. Thus the utility found itself in the unenviable position of paying millions of dollars per year in taxes for ducts that were not in use and would almost certainly never be used in the future. They appealed to the jurisdictions for relief. After negotiation a settlement was reached: If the utility could accurately identify the number and type of empty ducts on a point-to-point basis, and would agree to permanently retire that duct from service, they would be permitted to write it off their books and would be exempt from paying tax on it.
Unfortunately the utility kept no record indicating which ducts were empty and in fact had no record of which cables were in which ducts. All they had were a primitive database of property records which indicated the type of duct, the date it was laid and its endpoints in the grid. They had similar records for each length of cable. Their best estimate for how many ducts were empty was “lots”.
Doing the project “manually” by sending crews to perform a visual check would have taken decades, during which taxes on the empty ducts would have had to continue, and manpower costs would have made that process prohibitively expensive. The only option was to develop an AI-based heuristic to determine which ducts were empty on the basis of the information in the primitive property records database. The assignment was given to the utility’s internal IT group for scoping. They determined that the project was impossible and declined to undertake it. But certain senior members of the management  team were unwilling to accept that assessment and obtained clearance to bring in an outside consultant to do an independent assessment and a pilot. If the pilot was successful a full-scale project would be launched.
I was that consultant.

The Solution

Assuring Data Integrity

The first task was to clean up the data. The quality of the data was very poor; it was entered haphazardly, inconsistently and sometimes not at all. Over the years, it was entered from illegible manual sheets filled in by work crews. It was entered onto Hollerith punch cards using non-standard “over-punches” to represent esoteric but undocumented details. Records were 80-byte fixed length binary streams with both fixed and variable length fields, and with multiple record types with different fieldings. The database and record structures were largely undocumented and would need to be decrypted in order to have raw material upon which to build the analysis to determine which ducts were empty.
Pattern matching forensics were employed to identify common factors in the records with the unknown fields. Gradually the structure and semantics of some of the fields were identified; the newly identified fields provided additional clues to wring out the meanings of the remaining opaque fields. This process was repeated until virtually all the ambiguities in the data were resolved.
A virtual connectivity grid of ducts and cables were constructed to compensate for missing data and to cross-check the validity of existing data. For example if point A in the grid connected to point B which connected to point C and then to point D, and a certain duct type went from A to B and from C to D, it was virtually certain that there were no “duct gaps” and the duct also went from B to C even though there was no record of that duct arc in the database. The database was then augmented with the missing data.
Similarly if a cable of type N1 went from A to B and from C to D, but from B to C there was a cable of type N2 and no cable of type N1 (and there were no N2 cables between A and B or C and D), then it was virtually certain that the N2 designation for the B to C arc was wrong and that cable was really of type N1, to connect to the first and third arcs of cable type N1.
Methods like these, as well as other methods too detailed to document here, were utilized to obtain a relatively clean database that could then serve as the basis for the cable-to-duct mapping algorithms that would be used to solve the problem.

The Approach

The single most complicating factor in this project was that the cable-to-duct correspondence was not one-to-one. There were many varieties of ducts and many varieties of cable. Most duct varieties could hold multiple cables depending on cable type. Some cable combinations could not coexist in the same duct. Moreover the number of cables per duct, for each duct type, was a function of the variety of cable types traversing the same point-to-point arc, as well as the number of other ducts available for cable in that arc. The rules governing cable placement in duct were arcane and, again, largely undocumented.
In addition to physical limitations installer custom played a role. It was often left to the discretion of the cable installing crew to decide which duct to use when installing a new cable. This decision was made on the basis of custom – which varied over time and often over location – and which was again, undocumented.
In order to definitively nail down the physical rules governing cable placement a brute-force computer analysis of the cleansed property record database was performed. Many permutations of possible cable/duct combinations were generated and tested against the database; those that were plausible in theory but never occurring were deemed impossible.
To create a set of rules describing installer custom a further data analysis was performed. However the bulk of the information necessary to create this rule set required very extensive interviewing, focusing on senior and middle managers who started in the field, rose through the ranks and, ideally, had many years of experience under their belts. Interview subjects were drawn from the property records staff, from the engineering staff and from the installation and maintenance staffs.
As mentioned previously not all interview candidates were sympathetic to our efforts. In particular the property records staff, who were the custodians of the poorly maintained database, felt that our efforts were shining an unwelcome light on their shoddy quality of their work. They would have preferred for us to simply go away, so they aligned themselves with the utility’s IT staff who had previously deemed our project impossible and were fervently hoping for us to fail in order to avoid looking very foolish. We employed a great deal of finesse and diplomacy  to obtain everyone’s cooperation and as a rarely used last resort called for the intervention of senior management to compel cooperation.
In the end we succeeded in building a very robust database of rules documenting  both physical and custom based cable/duct combination possibilities.
Now the real work started. We created a time-lapse model of the utility’s entire electrical grid, node to node. Every instance a cable was installed we took a “time snapshot” of the duct/cable configuration of that arc, as well as of contiguous arcs, following the larger path of the new cable group’s length. We virtually envisioned what the installers would have seen when they were laying that cable arc and its extensions in both directions. We developed an intricate heuristic to test what would happen if the cable were placed in each of the available ducts and developed a probability model to determine where the cable likely ended up. We did this from “day 1” of the property record file to current date, circling back in time and revising if necessary, in the event future data made previous assumption unlikely or untenable.
In the end, based on our heuristic, we produced a detailed report that showed which ducts contained each and every cable in the system – and, most importantly, which ducts in the system were empty.
We selected a pilot group of representative system nodes and senior management arranged for inspectors to do a physical comparison of the reality against the predictions of our model in our pilot node set. Our accuracy was better than 98%! Minor tweaks to our algorithms were made based on feedback from the inspectors and the pilot process was repeated for another set of pilot nodes, yielding a slightly higher accuracy level.
Management determined that that accuracy level we had achieved was more than sufficient to impress the various jurisdictions and our full report was submitted to them. They were given access to the system in order to do their own physical checks and verified our accuracy claims. The utility identified which of the empty ducts they were willing to retire and they wrote those ducts off the books. Taxes were no longer paid on the retired ducts and millions of dollars per year were saved.
The project was a success!



No comments:

Post a Comment