I think I am running into all kinds of logic design pitfalls possible. Although it slows me down quite a bit, I consider this an excellent educational experience.
Race conditions is what hit me this time. In general a race condition is a state of a logic system in which particular logic gate temporarily outputs a signal which you are not expecting it to output. Considering a combinatorial logic function that is implemented in hardware you may think of the occurring situation as impossible and start testing your chips to find a faulty one. However, if you think about this for a while and take into account gate propagation delay it turns out that short spikes of invalid logic states are possible and quite common in logic systems.
The following example is very illustrative (source: Wiki).
This simple circuit is designed to output the result of logic function A & ¬A (A and not A). Of course it is expected to be always outputting logical zero (low state). However, due to a propagation delay of NOT gate (in the picture denoted Δt1), inputs to the AND gate do not arrive at the same time which results in a very short spike of logic high level on the gate’s output. This happens when the value of A changes from 0 to 1 and does not happen when it changes from 1 to 0. After this short glitch, the output signal stabilizes with the expected value. Depending on what you want to do with the output signal (and how soon) it will get you into trouble or not. Or worse, it may get you into trouble only sometimes, once every 10-15 runs (this is what happened to me) which makes the problem even more difficult to resolve.
I have identified two places in my design when this situation happened and occasionally caused the system to fail:
- The /ENALU signal. This signal is used to enable the ALU drive the ALU bus. Depending on the value of another signal /ALUSHR which decides if the ALU’s output should be shifted or not, the value of /ENALU is fed to the output enable pin of the direct driver or the shifting driver. To do so, it passes through a simple decoder circuit consisting of two OR gates and a NOT gate (very similar to the above example). Two OR gates output the enable signals for both drivers, and this signals should be mutually exclusive. The problem is that one of them has a NOT on one of its inputs, with additional propagation delay, resulting in a short period of both of them being logic low. The rest is predictable – two LS244 drivers trying to drive the same bus. Even though the illegal state lasts shortly, the excessive power draw sometimes caused the drivers and their neighboring devices to fall into oscillation and produce random values.
- The /ENMEM signal. This signal is used to enable memory (RAM or ROM). The situation is similar. On my memory card these is an address decoder which decides which memory chip to enable (asserting /BANK0RAM – /BANKnRAM signals). However, part of the BANK0 space is actually ROM and the enable signal should be directed to the ROM chip when certain conditions are met (i.e. address range is that of ROM). There is some additional decoding logic to assert either /BANK0RAM or /BANK0ROM. It is built using NOT and OR gates so guess what happens. RAM and ROM chips attempt to drive the DBUS at the same time. This short glitch stabilizes in most cases but sometimes causes the entire memory read cycle to fail. And definitely it is not healthy for the chips.
I cannot say I was fully unaware of this situation. However, I thought that if I give the control signals enough time to settle down, they would cause me no real problem. Surprisingly, it only came out when I rebuilt the MSW circuitry. Previously, the problem never appeared. It must have something to do with power distribution, or maybe some other electrical phenomenon I am not fully aware of. Even more surprisingly, this problem hardly ever occurred when I attached an oscillator probe to enable pins of any of the memory chips. Whatever was happening, it needed a fix.
Race conditions are part of logic design class of problems referred to as static and dynamic hazards. There are two ways to prevent static and dynamic hazard in the design. The primary and most elegant method is to design a circuit to have no static hazard (dynamic hazard is a result of static hazard). To do so, one should employ Karnaugh map and convert the logic function to such form that certain K-map static hazard elimination conditions are met (the topic is covered in an entire chapter in Mano/Kime book so I am not even attempting to explain it here). The simpler way to eliminate problems with a signal which is prone to race condition is to re-clock it. In other words, you should wait until you use the signal, long enough to be sure that the illegal value spike is long gone.
I decided to use the latter approach because it was fairly easy to implement. I OR’ed both low active signals (/ENALU and /ENMEM) with a master clock (CLK0). This way, if they need to be asserted, they are only asserted in the second half of the clock cycle. It is long enough for the ALU control signals and the /BANK0ROM and /BANK0RAM signals to settle. I had test routines running in a loop for about two hours yesterday and problems disappeared.
Very educational indeed. I guess I’m not the only one who appreciate your documentation effort in this web site. This in particular makes me think about simulation. I’ve never be so inclined to circuit simulation as a tool and that is exactly why… to write a simulator capable of taking all these kind of things into consideration is as hard, time consuming and error prone as just going ahead and building the actual circuit at once! –That’s why I think…
(sorry for the spelling mistake in my previous comment. I meant “That’s WHAT I think”, not “that’s why I think”).