The Problem of Errors

Recently I was reminded of what a problem error management poses and how much more expensive it is when it is poorly done or not done at all. I have been setting up a new piece of software but had some difficulty in getting one part of it to work. (The vendor and support organization should remain anonymous.) The operation I was attempting would fail but there were no clearly identifiable postings to the error log. And what events did seem coincident made no sense.

Back when I worked on Digital Vax machines there was a joke that the way DEC field service would fix a flat tire would be to check the other three tires first. This support issue seemed to go the same way as I was passed from person to person and retreading the same ground over and over. And of course, the folks I was dealing with were a long way from both me and the guys who write the code. Eventually the problem just went away leaving no clue as to what happened or changed to resolve the issue.

But this reminded me of how good the VMS error message convention was – DEC had designated a 32bit number for reporting errors. This was divided up into three fields – a three bit field for severity and two larger fields for facility and problem. Essentially, the error number told you who was complaining, what was being complained about and how bad was it. This concept seems to have gotten lost – current software uses numeric error numbers but only some of them are documented in public accessible form and one needs to know who was complaining to interpret the error number correctly. And then there are my favorite ‘fatal error – fault bucket xxxxxxxx’, which has no online documentation at all.

And having the error log entry display contain a nice user-friendly link that says ‘click here to learn more about this error’ – that takes you to an error page when you do because there is no index for that error. As a result, I have learned to do a search on Google and not bother with the vendor site at all. Why bother, it never works anyhow.

And along this line there are the health monitoring messages that complain about the health monitoring system, especially when the machine is starting up and the delayed start services haven’t as yet. After a while, like the boy who cried wolf, one stops looking at the health monitoring system at all. It may be doing useful things but since it seems more like a cranky hypochondriac aunt no one wants to associate with it. Probably not the design intent.

Now in the computer world, all of these errors were created by developer-written code. So someone decided to report ‘C0000005’ for a particular type of error and someone told them it was ok. There may even be a last chance exception handler that reports something before the program drops back to OS command level (preferred) or to bare metal if the problem is really bad. But what seems to be missing is the administrative step to collect this information, provide some additional support comments and put it someplace searchable. So costs were saved on the development side, but more than made up for on the support side. I spent a couple of weeks on this problem before it just went away and the folks I was working with put in a good week on their own plus communication time. Surely this was more costly overall than decent documentation?

So what happens is that everyone tries out their own personal ju ju – are we current with patches? Is your network up? How different are the clocks on all your machines? And so forth – if we don’t know what the problem is or why the problem went away then whatever we were doing, thinking or wearing may have been the reason. Lets do it again….

Back when I was a systems developer we took turns handling support calls from our customers world-wide. This was referred to as our week in the barrel and while there we were expected to not get anything else done. Our projects all waited for us to climb out. So we had a good idea what the issues were and had access to the source code as well so we could trace out what the programs were doing. I don’t think those folks I worked with had the same luxury. And besides, there are so many layers of code in current programs that finding the root cause might be problematic. And furthermore, modern pipelined processors don’t report fundamental errors synchronously any more – so the current instruction may not have anything to do with the real problem. One can understand the reasons for using interpreters and runtime frameworks – just to get control back for error reporting even at the cost of a bit of performance.

But in a sense the externalizing of customer support has another effect – the results of poor coding, or more likely the collision of multiple pieces of good code that just don’t happen to work together, is handled by people well removed from the perpetrators of said code. Their experience is probably summarized in some tidy management report that may eventually make it back to the developers but not necessarily. SO not only are costs enhanced but the learning diminished by the decoupling of support from development.

Then it struck me that there is a lot of this going on. Corporations and governments contract out their public-facing services and insulate the organization from the responses. You can rant or rave at your elected representative all you want but if that communication gets handled by their press secretary and never reported upwards it has no effect. Maybe they gauge public response by the weight of the mail and not the content. Or hold public meetings where attendee questions get danced around and then ignored. Or send out a mailing with a questions along the lines of ‘have you stopped beating your wife yet’? Each choice is really the same so only the form of listening to the public is followed. They don’t really want to know – it gets in the way of their plans and takes their minds of themselves.

A pity, this decoupling of action and support – so as actions at many levels get increasingly decoupled from perceptible reality, one wonders if this is how the Romans saw it towards the end?


More Power to the People — Ontario Edition

It has been interesting watching the articles in the Globe&Mail about the problems with getting connected to the MicroFIT program — the backlog of applicants is apparently substantial. Meanwhile, the people who bought into the program, quite literally, are paying the substantial capital cost for their solar panels, out of their own pockets when they had expected the government to pay. I remember when this program was announced — Ontario Hydro would pay ‘small’ solar systems a breathtaking 0.80/kw, but this was later reduced to 0.56 due to an unexpected response. Meanwhile contractors and wannabes were inundating mailboxes across the province with tales of terrific profits to be had if they bought solar, financed it over 10 years or so with the expectation that electricity sales would cover the finance costs and produce an open-ended stream of revenue at the end. And these systems are not cheap — I have seen pricing into the low six figures, a pretty hefty bet to make on a politicians promise. [The weather is more stable and predictable… but I digress.]  Sort of like the fairy tales that the wind sales guys tell farmers when they are looking for places to put wind turbines — but that is another rant.

The joke is that the grid is nowhere near robust enough to support the direct connection of thousands of little solar generators, so Hydro has adopted a go slow approach. And the little solar installer companies that started up on the belief that there would be a cascade of business resulting from the MicroFIT program are wondering what happened and starting to lay off. So much for the vaunted ‘green’  jobs.

And I am curious as to how all this inflow power would be managed?  I am aware that the province has been madly building gas turbine generation stations to backfill for the variability of wind — which can go +-100% in a few minutes. These generators spin all the time, so whatever greenhouse gases were saved by the wind turbines gets made up for with the gas turbines. With solar I am not so sure what they are planning.  The alternate energy folks talk about charging batteries when more is produced than needed —  but with these direct grid connected systems its unclear. I don’t think they are very happy producing electricity on a nice sunny day and just disconnected, but I could be wrong. One thing is clear though, the MicroFIT owner is not supposed to help themselves to the power — there are two meters setup, one for the property inflow and another for electricity produced with no connections between them inside the meters. That way the usage can be monitored and billed at one interval and electricity production monitored and payed on a very different interval. I have heard once a year…

Meanwhile, residential and small business users are seeing a new ‘Ontario Clean Energy Rebate’ that offsets 10% of the bill every month until 2015.  What I have been reading in the press is that this ‘rebate’ is being financed by additional borrowing by the province so it will all have to be paid back with interest. If true this is a cynical move on the part of the politicians to bribe us with our own future money. I am not sure my grandchildren will be so grateful.

The odd part of all this is that without large scale energy storage and a quantum leap in grid management I am not sure that this new, modern smart grid will be anything more than a very overpriced third world electrical system.  Existing hydro sites only go so far to provide power. When the existing coal and oil plants are torn down and the anxiety set gets their way with nuclear there will be a small base load and a wildly variable bunch of ‘green’ supplies. And the gap will be filled, hopefully, by madly burning natural gas — which seems plentiful right now (but so did a lot of other North American resources until they weren’t).  And does nothing to offset global climate change — although I am sure the spin doctors can come up with some cute coverup. So I suspect that blackouts and brownouts will become more frequent.

One wonders what might have happened if the Province had taken a decentralized approach like the US. There, tax incentives encouraged people to put in their own power — so a personal investment to cover their own costs.  The brilliant part of all this is that the load on the grid is reduced, rather than increased, so investments in infrastructure are avoided. But I guess its hard for some folks to think about decentralized, resilient and independent solutions when autocratic central control is their personal style.

Update — This morning I was reading an article in the NationalPost about the experience of Texas with their wind plants.  Seems they built out 10,000mw of capacity and shutdown a bunch of coal and oil burning plants — but after a bunch of clearly wind fluctuation-related blackouts had to bring them back online, not a cheap exercise.  And despite the aggressive buildout of wind in Ontario, with a huge project planned to destroy yet another bird sanctuary island, Ontario has a net surplus of electricity and has been exporting it at subsidized rates.  The National Post suggested that this is costing Ontario taxpayers a billion dollars a year.  And still it goes on.  The article cited a study that projects by the time the wind/solar buildout is completed Ontario electric rates will have doubled and we would have the dubious distinction of being tied with Denmark as having the highest power rates in the world. Considering the role of cheap, reliable power in the economic prosperity of the province, one wonders what the all-in cost of having the highest power costs in the world will be.  But I guess that once politicians get an idea wedged in their tiny minds, any mention of reality just takes their attention off themselves.  Too bad about the millions of other people in the province who’s jobs will be lost and lives destroyed for this fantasy.