2018/12/16

Why Merging Software Products is Harder Than Expected

Recently a friend of mine told the story of in a small company that was acquired by a larger company with a competing product. The idea being "to take the best parts of both products" and make something better. Ten years later, that had not happened, and in fact technical issues plus tensions between the two product teams, pretty much ensured it never would.

At one of my previous companies, something similar unfolded. The company had two tiers of products for the same market. One was high-end and very expensive, the other was the lower-cost alternative. Both were very mature products and as you might expect, over time functionality converged a bit, as the high-end product extended into features suitable for smaller customers, while the low-end product expanded its offering. Each product had its own strengths and weaknesses.

Management noticed that and said, "hey, these two products do a lot of the same things, why not combine them and get the best of both worlds?" They filled a chart board with post-its showing the commonalities between the products. Each post-it contained words like "generates reports", and "computes portfolio balances". Never mind that each post-it encapsulated big piles of bespoke code that had only ever talked to post-its within the same product.

 This is a bit like some big manufacturer saying,"we build trains, and we build buses, and they both do similar things -- moving people and stuff around. The great thing about a bus is that it can go anywhere there is a street. Trains can only go where the tracks have been laid, but they have a much greater capacity, better energy efficiency, and can be customized out of modular components. What if we take the best of a bus and the best of a train, and put them together? We change out the wheels and we have a tra-bus that can go anywhere! And the bus windows are much nicer, they're bigger and people can open them. Let's put those on."

And so it goes, until you have an incredibly ugly train-like thing built on a bus frame, that isn't street legal, that can't use existing modules (railcars) and now it needs more engines and better brakes, and so on.

As incredibly bad as this idea is in (vehicular) hardware, it's even worse in software; with these bu-train monstrosities, the problems of connecting the disparate parts are physical and easy to visualize.
In software, its too easy to lose sight of how parts fit together, and the possibilities are unbounded (blocking API, async API, restful API, redirected I/O, mapped memory, named pipes, sockets, protocols built on sockets, databases, data files, etc). Also, the parts can be written in entirely different languages that have trouble talking to each other. It doesn't even have to be different languages to be hard. Combining C++ written for Windows with C++ written for any other operating system can be insanely difficult. The challenges of integrating software products are enormous, and easy to grossly underestimate if you just break the product down into a bunch of functional boxes scribbled on post-its and say "hey, most of these boxes do the same thing!" If you see management doing something like that, its either time to speak up, or time to update your LinkedIn profile.

Inverting the emulation stack as a thought exercise

The very first processor I worked on as a software professional was the Motorola 6800, an 8-bit processor that could access 64K of RAM with a clock rate of 1-Mhz. The machines I worked on had less memory, and everything was done in assembly (of course).

Today I'm working on a Windows machine with a 64-bit, 4 core processor clocking 2.8 Ghz, and accessing... well, its Moore's Law in action.

Would it be possible to write a AMD64 emulator on the 6800? I don't think it can be done within the RAM of the processor. Given the complexity of a modern core, more code may be needed to emulate the instruction set than will fit in 64K. Restricting the code to a subset (off-boarding floating point, for example) might help it squeak by, and we would not implement many internal features such as instruction reordering or prefetch.

So.. if it takes a couple hundred 6800 instructions to emulate one x64 instruction (on average) and the 6800 runs at 2 MHz, and emulation doesn't require any I/O to a file system, then the emulated x64 would be executing about 5K x64 instructions a second, or "only" a 400K to 1 slowdown.