|Home > Misc > The Wall Of Pain|
Many shops, garages and service centers have a trophy area to display interesting failures and solutions to weird problems. This goes by many names; "Wall Of Pain", "Shelf Of Pain" and "Hall Of Failures" being common. In general these areas are away from customer's prying eyes and provide motivation and entertainment for those working in the shop. As a computer tech I see some interesting and weird failures myself. While I will probably never get around to hanging them on the "Wall Of Pain", I have placed them here on this virtual version to shock and entertain all who view.
Years ago one of the first terrifying service calls I had to make was to a customer who's accounting system had just failed. This system ran on a very old and (literally) crusty 286 that had been in use I was in elementary school. The issue I was called on was a complete failure of the system: upon turning it on in the morning, it simply would not boot. After doing the standard diagnosis (check power supply, peripherals, etc.) I found that the culprit was the hard drive. It was no longer being detected by the system, causing it to hang on startup. Before leaving the customers I requested the backup tapes so I could restore the previous day's backup after I installed a new hard drive. Seeing some confusion on the customer's face at the mention of "tape", I glanced at the front of the system and for the first time noticed that there was no tape drive. Upon being asked how the backup was performed, the customer informed me that the system (which had been in use for at least 10 years) had never been backed up.
When I got the machine back to the shop I figured that I might as well take a close look at the hard drive. After all, it would have to go out for data recovery anyway so it's not like I was worried about voiding the warranty. And if I could fix it, the customer could be up and running within an hour instead of a week.
I popped the cover off the hard drive and to my amazement the problem was totally obvious.
Take a very close look at the green wire along the bottom of the drive. Notice how it's connection to the terminal block is a bit strange compared to the red? You can see the details in the image below...
If you look carefully you will see a small bit of solder on the green wire. The hard drive had failed because the green wire fell out of the terminal block and was floating around the inside of the case. I soldered the wire back in place, put the cover back on the drive and it booted up as if nothing had happened.
After I made a copy of the vital data and transferred it to a new hard drive, the computer was returned to the customer. Shortly after a backup strategy was developed, which lasted a few months until the system was replaced with something written in this century.
In July of 2005, I got a call on a boring Wednesday around noon that a customer's server was down. Apparently, there was an "electrical disturbance" earlier, the server shut down, and now it would not boot up. I was not in a position to go on-site, so I was instructing a consultant over the phone in what to do. In the meantime the details of the electrical disturbance emerged. It seems that a squirrel had somehow managed to get himself/herself between the primary and secondary of the transformer outside the building, thus shunting 13,500V into the building's wiring. The squirrel was vaporized, and the UPS seemed to protect the server and all attached equipment yet strangely one hard drive was hit.
We went through the standard checks, but the server (running NT4) would take forever to POST, NT would hang at it's blue boot screen for minutes at a time, and then finally show a blue-screen-of-death and a core dump. Based on what the consultant observed, I told him that it was likely a bad HD and he would have to remove the disk from the SCSI chain until I could get there and replace it. Luckily, the server was set up with 4 drives, in two mirror sets. So removal would still leave one half of the mirror in tact, and thus the customer could keep working in the meantime.
A few nights later the customer was finally able to shut down the server at the end of the day, so we went about replacing the HD. Upon removal of the dead Quantum Viking II 9.1 gig drive, it was noticed that the drive had an odd rattle to it. So after the new drive was installed and while we were waiting for the mirror to resync, I opened up the old drive. To say that I was astonished with what I saw would be an understatement.
This is the worst drive destruction I have ever seen. As far as I can tell, when the surges and spikes hit, the heads got welded to the surface of the platter. Since the drive was spinning at 10K RPM at the time, the welds then broke free and the heads proceeded to grind their way through the platter.
The black dust spread throughout is what is was ground off the platter. Notice the some of the heads are totally separated from their arms. The voice coil is fried solid, and clogged with the black dust.
All the platters are physically separated from the spindle. The edges have been ground razor sharp to one of the finest edges I have ever seen.
What I find most amazing is that the drive continued to try and spin. The customer reported hearing a "groaning" noise after the server was re-powered, which finally stopped with the bad drive was removed.
What saved the customer from massive and unrecoverable data loss was the fact that the drives were mirrored, and they had a proper backup system in place. Someone had spent some money and set things up correctly. I'm keeping this drive to show to any customer who complains to me that a proper backup costs too much.
Several years ago on a midsummer Monday morning I received a call from a customer saying that one of their machines had made an odd sound when it was turned on following the weekend. The computer ran one of their large room-sized dye sublimation printers (used for printing large posters) and was thus quite important. In addition, they wanted me there quickly because the noise was disturbing the customers. That last statement caught me off-guard a little as the machine is well behind the counter and around a corner next to the generally very noisy printer.
Upon arrival I heard nothing out of the ordinary and was told that the machine was powered off as it was getting louder. After I put my tools down and hit the "on" switch, it was immediately apparent what they meant. As the hard drives spun up, the entire building was soon filled with an ear-piercing squeal that can only be described as the sort of noise a dentists drill would make in hell. I powered off the machine and brought it back into the shop in the hope of cloning the drive to another before it completely failed.
Once in the shop I connected a spare drive up to the system and began the cloning process. After a few minutes the headache inducing squeal had grown in volume so much that it required a makeshift muffler to be constructed out of packing foam and an old toner box. Even with the silencer, ear plugs were necessary and the shop dog (a playfull and smart border collie) had curled up in the corner with his paws over his ears. Finally after several hours of enduring the racket and only getting a few hundred kilobytes of data from the disc, I gave up.
Here's a short MP3 (about 400K in size) of the drive powering up and shutting down. For full effect, make sure to have your speakers cranked up to a decent volume.
Bad Hard Drive Bearings Recording (MP3, 459KB)
The drive in question was an older Quantum SCSI drive of an approximately 4 Gig capacity. Sometime over the weekend the bearings had failed. The sound you hear is the spindle scraping across the bad bearings at around 7200RPM. Luckily the machine's only job was to process and queue print jobs for the big printer, so no actual data was stored on the drive. All that was required to get the customer back in business was a new hard drive and an operating system reload.
Well, technically I didn't find it, the customer did...
In December of 2006, a customer called complaining that her printer was making an awful grinding noise and jamming up. It took me about an hour to leave the office and get to the customer's site and when I arrived I was told that they found the problem. Apparently there was a foreign object lodged in their toner cartridge:
He must have wandered in there overnight looking for a warm place to sleep, and then received an awful surprise when the printer was powered on in the morning.
The customer had swapped in a new toner cartridge and was printing fine. Unfortunately the main drive motor in a Lexmark T620 laser printer is a fairly powerful beast and the toner rollers had flattened the little guy quite effectively. So it was my job to clean the goo out of the printer.
This service call made my week and I was still laughing half an hour later. The user of the printer thought it was hilarious and was showing it to everyone in the office, with various reactions. Last I heard she ended up making Christmas cards with the picture...
Our shop contracts for a company that manages computers and software for a great number of pharmacies in the area. As such, I'm often called out to solve a problem without knowing the background info other then the symptoms. This requires some diagnostic skill, and can often lean to interesting conclusions.
This particular call was to fix a machine that had a "locking up" problem. Figuring it was a bad CPU or power supply fan I went over to the customer's for a straightforward fan swap. When I arrived, I was given the entire story. The setup consisted of two machines, a main machine and then another terminal. The main machine held all the data which was shared over the network with the 2nd terminal. A fairly standard setup, one that is employed in virtually every pharmacy I service. Apparently the main machine would randomly shut off, thus freezing the 2nd terminal and requiring a reboot of both. In addition, the main computer was brand new.
I pulled the case and checked it out inside. There was a power cord caught in the CPU fan which prevented it from running. After I removed the cord, I powered up the system, installed the case cover and began returning it to it's shelf confident that I had fixed the problem. Moments after I put the system down, it once again powered off. Again I pulled the case and found that the CPU was very hot. The fancy case had an extendable tube that lead to a vent in the side of the case, intended to allow the CPU fan to draw air directly into the heatsink. While a good idea in theory, in practice the airflow in the case meant that virtually no actual air was drawn through this tube and the processor ran quite hot after a few minutes because of it. So I blocked off the tube, powered up the system, installed the case and put it back on the shelf. Once again, it seemed that it powered off as soon as I took my hands off of it. At this point it was getting a bit annoying since as soon as I had the machine reassembled it would power off again.
Taking the system from the shelf I placed it on the floor, powered it on and began tapping around the case. All was well until I tapped on the back of the power supply; the system powered off. I had found the problem! It was consistent as well. Turn it on, then tap a few times on the back of the supply and it would shut itself down every time. We called the vendor (a friend of the pharmacist...big surprise) who refused to believe that his "brand name" power supply was intermittent, so I recorded the problem on video.
Power Supply Turns Off By Tapping (Windows Media, 1 Meg)
I swapped in a new power supply, verified that it would not shut down when tapped and then left to attend to other calls.
Approximately a week after the first incident, I was called to this pharmacist's other store for the same problem. It seems that he had purchased two new systems from his friend and the 2nd was exhibiting the same symptoms. As soon as I arrived, the first thing I did was to tap the back of the power supply and watch the system shut down. A new power supply of course fixed the problem and it is probably safe to assume that the pharmacist will no longer be purchasing systems from this particular vendor.
There is nothing very exciting about this failure. A customer called and said that every time she printed, her laser printer would make a nasty clacking noise. When I arrived at the customer's office I made some test prints, only to find that the gears in the fuser were stripped. The printer was brought back to the office but ultimately not repaired because a newer model printer would have cost less then replacing the fuser on this one. What is notable however is the number of pages printed according to the diagnostic printout: 252,946! Not too shabby for a low volume printer over a period of 2 years...
Lexmark E321 Fuser Grinding (Windows Media, 1.3 Meg)
Back To Misc Page | Mail Me | Search