This is my first attempt at the ALU - preserved here with all the mess - nothing tidied up.

ALU inputs:
  • A - the primary input
  • B - the secondary input
  • Ci - the carry bit
  • F0, F1, F2 - the three bits needed to select the ALU function
ALU outputs:
  • R - the result
  • Co - the carry bit
After much research the ALU design is a simplified/extended version of the AYTABTU ALU and is shown below.

The total is 13T per bit slice.
  • INV controls whether B is inverted, as it needs to be for SUB.
  • MODE controls whether the carry bit is ignored or not.
  • F1 controls whether gates 5,7,8 and 9 perform XNOR or NAND.
  • F2 controls the output from ADD/XOR/AND - if it's high then all of this is ignored.
  • F3 controls whether gate 6 performs NOR or has zero output.
  • F4 controls whether the inverted ASR bit is enabled.
  • ADD/SUB - MODE is low so that the carry bit is enabled.   F1, F2 and F4 are low so that ADD/SUB flows through, F2 is high so that the NOR on gate 6 outputs low.
  • XOR - as ADD but MODE is high so that gates 10 and 11 are disabled and the XOR flows through to the result
  • AND is like XOR only F1 is high
  • OR needs F2 high so that gates 11 and 12 are off and F3 low to enable the NOR on gate 6.
  • ASR needs F2 and F3 high so that all the blue circuitry outputs LOW.  F4 is high so as to enable reading of the higher bit, the inverted form of which is inverted again by gate 14.
And, here it is, one bit slice of the ALU.

Note that the two and three input NOR gates are one transistor with 10k resistors to base and 1k to 3.3V.   This doesn't work for more than three inputs, so the four input NOR gates (11 and 14 above) are diodes from all the inputs and then a 10k to base.   Getting all of this wired up took me a couple of days and it's still very fragile (too many long bare wires that touch if components are moved).   It needs laying out on perfboard.

The Ripple Carry bottleneck.

This is going to be the main bottleneck of the whole CPU, so when I tidy this up it'll get a page of it's own.   It makes a nice story, first I was worried about the component (leg) count so I designed a very efficient memory cell, then I was worried about the overall speed so I optimised the ripple carry.

First I hacked up something that looked like half the ripple carry.   16 NOR gates, 10k from the output of one to the base of the next.  5k from base to ground (assume 3 input NOR, each with 10k to ground).  1k load.   It runs at about 140 kHz so I would expect the whole thing to run at 70 kHz, or about 6 times slower than the memory.  This is a bit disappointing, if not completely unexpected.  It's time to work out how speed-up capacitors work.  Okay - RC should be considerably less than the switch time which is 1/140000 for 16T.  R is 10k so C should be much less than 44 pF.   12 pF runs 25% faster. 

More analysis.  I've ruled out one-transistor solutions (see Outtakes) so that leaves speeding up the current design.   It's gates 10 and 13 that matter, an even then it's only the carry path that is slow, so the other inputs can be diodes (so I don't have to worry about input inpedance).   Its capacitance that is the killer, so I can:
  • reduce the voltage range, so storing less charge
  • decrease the resistors to deliver more current
  • add "speed up" capacitors to compensate for the transistor capacitance
With a 1k load (my standard) I get a very sharp cut off with the 2N 3904 B311 NPN transistors I'm using (my next order is different, I have to check again).  Here it is the voltage across the transistor as a function of bias voltage (somewhat crude):

So with <0.63V input for OFF and >0.70V for ON I have great switching, that's only 0.07V to switch over, so much less charge to store for the same capacitance.

Note - Baker Clamps and Schottky diodes so that the transistor never saturates

Note - interesting post saying that it's the OFF time that slows everything down

Note - NPN + PNP push-pull