I was running some CAD and simulation jobs on my main machine and was shocked but then resigned as to how much memory was being used without me really trying. I wanted to take on some modeling and simulation work for a local client, and was thinking of ways I could speed things up a bit, and not run out of resources. Being a resourceful hacker type (i.e. a massive cheapskate hoarder) I have old computers lying around and piles of parts that together could be combined into something useful. I had an old (at least 10 years) HP Proliant DL380-Gen 6 in storage, unloved and unwanted. I had some success in wedging graphics cards into older server machines before, so it seemed like I could do it again.
Let’s get one thing straight right now; server machines and desktop PCs are different animals. These machines are designed to sit in a rack, with only ethernet connectivity, and are not intended as a desktop. They don’t generally have anything but the most basic of graphics outputs if anything at all. The computer manufacturers are clear with what these machines are designed for; if you want a cheaper consumer machine – buy a desktop. If you want more power – buy a workstation. Need a server for your business – then buy a rack and stuff blades into it. They don’t want you hacking around with the servers machines and certainly don’t go out of their way to make it easy.
Your usual server blade will be sitting there, with its pile of disks running some sort of virtualisation and hosting multiple operating systems, each running whatever server application is needed. I wanted to unlock the whole machine and run it ‘bare metal’ with a single operating system, with multiple graphics outputs and get access to all the machines’ resources. Can I do it? Yes of course I can. It really isn’t that hard.
Issue 1: GPUs
As I said earlier, server machines are not intended for desktop use, so support for GPU cards is not exactly a priority. The first problem is expandability and physical space. The DL380 uses riser cards for the PCIe. It has two slots on the motherboard, which mate with PCBs attached to the PCI riser cage. This also serves the trick of rotating the orientation of the PCI cards so that they are parallel to the motherboard, allowing the whole unit to have a low profile. Nothing unusual here.
Each riser slot serves a riser card with two PCIe-x8 slots and a single PCIe-x16 slot. There is not a lot of space inside the cage, so a massive GPU card is totally out of the question. But not for just that reason alone. The slots do not provide anywhere near enough power, and the system has no available auxiliary power supplies for a GPU. There is only one solution, and that’s to find a GPU that is passively cooled and has a low total power consumption. Finally, there is the issue of cooling. Passively cooled means no fans, and that means the GPU core temperature will be dependent on airflow through the entire case alone.
The card selected was an MSI version of the old Nvidia GeForce GT 710 due to its small size, low cost, and meager 19W TDP. I was confident these would work just fine. It turns out that this card only uses the PCIe-x8 interface despite having a PCIe-x16 width connector, so more bandwidth could be had in the future if I decide to upgrade to something faster.
Finally, there was not enough space on the rear panel openings in the riser cage, so I needed to modify the metalwork a little. A little filing (OK, a lot of filing) later and the cards fit nicely, so the video cables would actually fully mate, and the cards lock into place.
Once the cards were installed, and it was hooked up to my quad monitor setup, it was time for some testing. Which is when the problems started. First off, those six CPU fans spooled up to such an extent that the noise was unbearable. Temporarily ignoring that for a while (I bought better ear defenders) I moved on to the operating system.
I’m a huge Linux fan, having used it since the days of installing Slackware on piles of 3.5″ floppy disks. I tried. I tried really hard, hour after hour every day for at least a week to get any Linux distro to handle the dual GPU configuration spread over four displays. After countless hours of research, I concluded that there was some partly understood bug in X.Org that was not being actively worked on, and abandoned this line of attack. There was only one viable option; to step into the murky world of Microsoft and install Windows 10.
Issue 2: Operating System
Can you run a desktop operating system on a ten-year-old server, with RAIDed hard disks, two GPUs, and two CPUs and expect it to work without any hassle? Turns out you can! It just worked out of the box, no configuration was necessary. Well, almost. Turns out I bought a Windows 10 Home Edition license, and that will only support a single CPU socket. That sucked. So, I bit the bullet and spent some more money on a Windows 10 professional license, which unlocked that second CPU.
Now the second socket was accessible, I drifted back to eBay, snagged a pair of better processors, and found some clean server-pulled Xeon X5670s. These have six cores running at 2.9 GHz, giving me a total of 12 physical cores (as 24 threads) albeit only DDR3 memory. This is the fastest configuration for this machine, but it is still old. However, it should still be quite capable of simpler workloads.
Issue 3: System Ram
The system RAM is arranged as three banks of three DIMMS per CPU socket, that’s 18 slots in total. I wedged in as much RAM as I could lay my hands on, and had no issues getting it all to work. However, looking at the reported memory speed and then the system configuration manual, it looks like the system has some bandwidth limitations. The long and short of it is that if you populate all three banks on either CPU, then the DDR3 memory bus speed drops from the maximum 1333 MHz to just 800 MHz, and that just won’t do.
So, after sacrificing one slot per bank (i.e. dropping six DIMMS) and using the paired DIMMS in the correct slots I got it down to the fastest configuration, giving a total of 72 GB of DDR3-1333 RAM. I could have gotten a little more but already had spent enough on eBay, and I wasn’t quite done yet.
Issue 4: RAID Controller RAM Backup Battery
With old machines come annoying niggles. For these Proliant boxes, one thing that fails is the rechargeable battery attached to the P410i RAID accelerator. The SAS drives in the front of the machine are controlled with a dedicated hardware RAID controller, that has either 256Mb or 512Mb of DDR2 RAM-based write-through cache memory. In order to preserve data integrity, when the operating system shuts down, there is a pause whilst this RAM buffer is flushed out to the drive array. In the event of a power failure, the cache contents are retained with the help of a rechargeable battery, so upon the next boot, the cache can be flushed and the unwritten data preserved. The battery is separate from the P410i and hangs on the end of a special cable.
This battery module itself is NiMH and does self-discharge over time. The system automatically disables the writeback cache until the battery is fully charged and this can take a few hours. A few hours of degraded performance. Not good. Worse still, the battery eventually fails completely, and the writeback cache ends up permanently disabled. The solution is either to find a good spare part or in my case, just rip it out, solder some wires to the contacts, and attach a 4 x AA battery pack. The batteries were charged in a dedicated charger unit before use, so no in-system recharging was needed and the cache was enabled from boot.
Issue 5: Cooling and Excess Noise
As I alluded to earlier, dropping those GPUs in there really upset something. After a bit of reading, the culprit was the Integrated Lights-Out (iLO) subsystem. These machines have a dedicated subsystem for remote management, complete with its own dedicated ethernet port, which runs on a small microcontroller on the motherboard. It is powered by the standby power supply, so it’s operational even if the main system is powered off. It is this iLo subsystem that is in control of the cooling.
When the GPU cards were dropped in, they were not recognised as an HP PCI card, and since the iLo doesn’t know their power profile, it has to assume the worst case and crank up the cooling. It turns out to be a fairly common issue, and some enterprising hackers have managed to hack the iLo FLASH image to add an additional ‘FAN’ control command, (but that was for the gen8 machine, mine is a much older gen6) to allow it to be manually cranked back down again. But the procedure seemed risky to me, as the iLo has no ROM image backup, so if the update fails, or is incompatible with this system in some way, that will totally brick the machine. I didn’t want to risk it, so I took an alternative approach, which I think will work out better in the end.
Arduino-Based Cooling Hack: Hardware
The iLo system has control of those six modular cooling fans at the front of the system. Air is drawn in past the drives, then through the huge heatsinks above the CPU sockets, and RAM modules simultaneously. Next, the air passes over the motherboard, into the PCI riser cage, and over the GPU card heatsink. Finally, it is exhausted out of the rear of the chassis.
The fans are Delta FFR0612DHE units, in a custom plastic enclosure, terminated with a plug that allows them to be hot-plugged straight into the motherboard. The iLO has control of the fan PWM input, so since we can’t change the firmware, to allow fan control via Windows, we have to break into the PWM signal of each fan.
I chose the easiest route possible, which was to use an Arduino Nano (clone) I bought especially. The fan PWM input requires a 12V reverse-PWM input, which means that full speed is 0% PWM or tied-to-ground, and zero speed is 100% PWM or tied-to-12V. Since the nano is a 5V unit, we needed a simple circuit with a transistor per fan, to create an open-drain voltage shifter. I just picked some random N-channel MOSFETs and some 10k resistors. You could use practically anything for this.
Now that I had the driver side of things made, it was time to hack the fans. It was a straightforward job to break open each fan module, cut off the PWM wire from the connector (leaving enough to reattach if I change my mind) then solder on a length of wire.
One annoying issue was that I could not find a socketed 12V anywhere, so I needed to tap into the system 12V somewhere, in order to feed that buffer board I made. I did the obvious, and just tacked some wires onto the back of the SAS backplane, and called it done.
The buffer board was hooked up to the PWM signals on the Arduino Nano, for a quick test. I wrote a simple sketch that changed the fan speeds one at a time, to prove I hadn’t made any wiring mistakes, then proceeded to install this mess into the rear of the SAS backplane.
One nice feature of the motherboard was the helpful internal USB socket, just behind the far-left end of the backplane. This was perfect for powering and controlling the Nano. A little heat-shrink and tie-wrapping later, the circuits were squirreled away somewhere they would not cause too many problems.
Arduino-Based Cooling Hack: Software
The first things to think about were failure modes. I wanted to have a split software package with simple dumb firmware on the Arduino, and all the smarts on the PC side. The communications between the two were via an emulated serial port that you get for free with the Arduino stack. The idea was that the Arduino firmware would simply sit there waiting for fan control commands, programming the PWM outputs as appropriate when a change was noted. I noted that the first failure mode would be due to failed comms to the Arduino, so the default behaviour was to sit in an infinite loop, expecting six fan control values, and updating the hardware each time. If after a suitably small timeout expired, then all fans would ramp immediately to 50%.
The initial fan control rate was 100%, for a few seconds, in order to ‘blow out’ any accumulated dust and spiders, then it would settle down to 50% until the operating system had booted and the other end of the link was established.