TLC: The Tiny Little Computer

The Documentation page, and most especially the Architecture and Micro-Architecture section, is even more a work in progress than the rest of TLC. Expect the material to be incomplete and to contain errors until this notice is removed. (There will still be errors after that, but they'll be the ones I haven't found yet!)

"Greyed out" material in the documentation describes features that are planned but not yet implemented.

To write and run programs for TLC, you only need to read Introduction, Assembler and Instruction Set, and Operation. If you want to know more about how a small computer like this one could be realized in hardware and how close the simulation comes to such a realization, you will find that information in Architecture and Micro-Architecture.

Introduction

TLC, the Tiny Little Computer is a language translator and simulator for a computer with a very limited instruction set and very limited memory. Even so, you can write and run interesting programs with it. The assembler is TLC's language translator. It translates human-readable source programs into binary machine instructions for the virtual machine. The virtual machine is a simulation of the computer. It can load and run programs that have been translated by the assembler.

To write programs for TLC, you will need to read the section Assembler and Instruction Set. To run your programs after they've been translated, you will need to read the Operation section. The Architecture and Micro-Architecture section describes the technical details of the simulated computer that is TLC. With the exception of the input and output facility, there is enough information to construct a prototype from digital logic.

You will find programming to be much easier if you express your algorithms in pseudo code before you begin writing TLC instructions. That allows you to separate the process of algorithm development from the process of programming. That is especially important if you are writing in an assembler language.

Assembler and Instruction Set

The TLC assembler has two panes. The left pane is the source code pane; a source program in the format described below can be loaded from the browser's local storage, pasted into the pane from an external editor, or created by typing directly into the pane. (Note: in web browsers, the tab key moves focus from item to item, and so cannot be used to enter tab characters into the pane.)

A source program loaded from local storage can (and should!) be saved with the "Save" button if it has been modified. An new program should be given a file name with the "Save as" button and input area. File names are limited to upper- and lower-case letters plus digits and underscore. File names are case sensitive.

The "Assemble" button translates the source code into TLC binary object code and produces a listing in the right pane. If there were no assembly errors, the object code is also saved in local storage using the same base name as the source file. A source program that has not been given a name by saving it with "Save as" cannot be assembled.

The "Print" button prints the result of the assembly process.

The file selector and "Delete" button at the far right of the button area allows files saved in local storage to be deleted. The source file and object file (if it exists) are deleted. Deleted files cannot be recovered.

For security reasons, web programs cannot write disk files directly. To save a file outside TLC, load it into the assembler's left pane and use copy-and-paste to copy it to an editor.

Format of Assembly Language Statements

Assembly language statements for TLC are of the form:

[label] op code [address] [comments]

The label, if present, must begin at the left, with no preceding white space. The operation code must be preceded by white space regardless of whether a label is present. (This is different from LMC, which allows operation codes to begin at the left.)

Labels, operation codes, and symbolic addresses are case-insensitive. Labels and symbolic addresses are not length-limited. For valid operation codes, see Instruction Set, below.

Addresses may be symbolic addresses, which correspond to labels defined elsewhere in the source program, or numeric addresses. Numeric addresses may be expressed as decimal numbers or hexadecimal numbers. Hexadecimal numbers must be prefixed with "0x" –the digit zero followed by the letter x. Example: 0x5A or 0x5a. Addresses are limited to the range 0-127 or 0x0 to 0x7f.

Anything following the address, or for operation codes that do not take an address, anything following the operation code, is treated as a comment.

By default the assembler generates object code beginning at location zero. The address where a sequence of instructions will be loaded can be changed using the org assembler directive, which takes a numeric address as an operand. Example:

      org     0x50
      add     const1    The "add" instruction will be at address 0x50
      brp      done       BRP will be at 0x51, etc.

The first executable instruction must be at location zero, as it will be by default.

The dat assembler directive reserves storage for a 12-bit numeric constant and optionally initializes its value. The constant may be a signed or unsigned decimal number or a hexadecimal number. To express a negative number in hexadecimal, use the two's complement of the unsigned value. Because TLC uses 12-bit words in two's complement format, constants are limited to the range -2048 to +2047. If no operand is given, dat reserves space but does not initialize it. Example:

const1      dat      1
minus1     dat       0xfff
minus1     dat       -1
datum      dat               // one word, not initialized

Instruction Set

TLC's instruction set is a superset of that of Stuart Madnick's Little Man Computer, and with the same semantics. In the table below, <addr> represents a numeric or symbolic address, xx represents the address as two hexadecimal digits, and <str> represents a character string. The synonyms for operation codes accommodate different "dialects" of LMC.

Mnemonic	Op code	Description
ADD <addr>	1xx	Add. The contents of memory at address <addr> are added to the accumulator. An Overflow machine-check occurs if the result is greater than 2047 or less than -2048; the truncated result is stored in the accumulator.
SUB <addr>	2xx	Subtract. The contents of memory at address <addr> are subtracted from the accumulator. An Overflow machine-check occurs if the result is greater than 2047 or less than -2048; the truncated result is stored in the accumulator.
STO <addr>	3xx	Store. The contents of the accumulator are stored in memory at address <addr>; the previous contents of memory at <addr> are lost. The mnemonic STA is accepted as a synonym for STO.
LDA <addr>	5xx	Load accumulator. The contents of memory at address <addr> are loaded into the accumulator; the previous contents of the accumulator are lost.
BR <addr>	6xx	Branch unconditionally. The value of <addr> is stored in the program counter, causing the next instruction to be fetched from that location. The mnemonic BRA is accepted as a synonym for BR.
BRZ <addr>	7xx	Branch if zero. If the accumulator holds the value zero, the value of <addr> is stored in the program counter, causing the next instruction to be fetched from that location. If the accumulator is non-zero, the program counter is not modified and the next instruction will be fetched in sequence. The accumulator is not modified.
BRP <addr>	8xx	Branch if positive. If the sign bit of the accumulator is zero, the value of <addr> is stored in the program counter, causing the next instruction to be fetched from that location. If the sign bit is one, the program counter is not modified and the next instruction will be fetched in sequence. The accumulator is not modified. Note that a value of zero in the accumulator will cause the branch to be taken because the sign bit of the number zero is zero.
IN	901	Input. A number is requested from the input subsystem and stored in the accumulator. The previous contents of the accumulator are lost. The input subsystem will not deliver values greater than 2047 nor less than -2048. The mnemonic INP is accepted as a synonym for IN.
OUT	902	Output. The contents of the accumulator are copied to the output subsystem as a decimal value, followed by the newline character. The accumulator is not changed.
HLT	0xx	Halt. Instruction execution stops. Operating the "reset" switch will restart the program from location zero. The mnemonic COB is accepted as a synonym for HLT.

Extended Instructions (Not yet implemented)

Mnemonic	Op code	Description
CALL <addr>	Axx	Subprogram call. The number at the highest memory address is decremented, the program counter is stored at the address pointed by the contents of the highest memory address, and the <addr> portion of the instruction is placed in the program counter. In other words, the highest memory address is used as a stack pointer for a stack that grows downward from high memory. Resetting the machine or loading a program stores the address of the highest memory location in that location. For a machine with 128 words of storage, the value 0x7F is stored at location 0x7F.
RET	Bxx	Return from subprogram. The memory location pointed by the contents of the highest memory address is loaded into the program counter and the value at the highest memory address is decremented. Only the rightmost seven bits of the value on the stack are used. The address field of the instruction is not used, and is reserved.
OUTN	903	Output with No newline. Identical to OUT except that sending the newline character is suppressed.
OUTC	904	Output character. The rightmost eight bits of the accumulator are sent to the output subsystem, to be interpreted as an ISO-8859 character rather than a number. No newline character is sent.
DAT "<str>"		Character data pseudo-instruction. The characters in <str>, which must be enclosed in double-quotes, are stored in the rightmost eight bits of consecutive words. Example: DAT "Hello, world!" Escaped characters are not allowed. A newline character may be stored by coding DAT 0x00A. If 0x00A is processed by an OUT instruction, it will be rendered as 10. If it is processed by OUTC, it will be rendered as newline.
PIO	9ff	Programmed Input/Output. TLC uses the three highest possible memory addresses, 0xfd, 0xfe, and 0xff as registers for I/O. The programmer must store a device address in 0xfd; the input device is 0x000 and the output device is 0x001. For output, the data word to be written must be stored at address 0xff. For input, the value read will be placed in address 0xff. After the registers are set up, the program issues the pio instruction. The program must then loop, testing address 0xfe until it becomes one, indicating the completion of the I/O operation. Any input or output operation attempted before 0xfe becomes one will cause a machine check.

Operation

Loading and running a program: An assembled program can be loaded into the virtual machine by selecting its name in the drop-down and clicking "Load." The program is loaded into memory, the program counter (PC) is set to zero, and the highest memory location is loaded with its own address so that it can be used as a stack pointer.

Image of virtual machine control panel — The simulated control panel for TLC.

When a program has been loaded, clicking "Run" starts the simulated clock running, and the program runs at a speed proportional to the clock speed. Clicking "Step" executes a single instruction from the location pointed by the program counter. Note that executing a single instruction may take up to six clock cycles.

The "Pause" button stops execution at the end of the current instruction. Because each instruction takes several clock cycles, Pause does not necessarily take effect instantly.

Clock speed: Clock speed is set by a range control (slider.) The speed is variable from a few seconds per pulse to about 50ms. per pulse. Speed changes take effect only at the end of the current instruction, and not immediately.

Reset: The "Reset" button sets the program counter to zero and loads the highest memory location with its own address. Using Reset allows a program to be restarted without re-loading it. Reset is valid only when the clock is stopped.

Show output and show trace: The default is that the green screen area shows the output of the running program. The virtual machine also keeps a record of every instruction that is executed, a trace of the running program. The "Show output" and "Show trace" buttons toggle the output area the program output and program trace. The initial default is to show the output.

Input area: [not yet implemented] The input area is active only when an input instruction is being executed. The clock stops while input is pending and resumes with the input value in the accumulator. Three kinds of input are accepted:

A decimal number in the range -2,048 to +2,047 The plus sign is not required for non-negative numbers.
A hexadecimal number, prefixed with "0x" in the range 0x0 to 0xfff. Any value over 0x7ff is interpreted as a negative number in two's complement format.
A single printable character; the eight bits of the character's ASCII code are stored in the rightmost eight bits of the accumulator.

Register and memory contents: While a program is running on the virtual machine, you can place the mouse cursor over a register or memory cell yuo see the values in binary or decimal and, for memory cells, how the cell would be interpreted as an instruction. It takes a about a second for the tool-tip to appear.

Breakpoints: Clicking in a memory cell will set a breakpoint at that location. When it is about to be loaded as an instruction, the clock will stop so that you can inspect the contents of memory and registers. Click "Run" or "Step" to continue. Using a memory location as data does not trigger a breakpoint.

Saving the output: Click the "select all" icon on the lower right out the output screen, or click in the output screen and press control-A. Press control-C to copy the selected text to the clipboard; the text may then be pasted into another program such as an editor.

Architecture and Micro-Architecture

TLC is entirely a creature of software emulation, but we wanted to show that it is possible to build hardware that will execute the TLC instruction set. With the exception of input and output, the student who applied himself could, in theory, grundle over to the digital logic lab and build a TLC from logic gates, a clock source, and a power supply. If twelve switches were used for input and twelve LEDs for output, it would be possible to realize all of TLC in hardware, but you would get only one input operation per program run.

There are some details left out, such as the fan-out of the gates used and the way loading registers from the C-bus on the falling edge of the clock is implemented. These are engineering details, and the fact that they are omitted does not diminish the practicality of the general design.

Early computers and today's simple computers all follow the pattern of registers, an ALU, two input buses and an output bus. The size of the registers and buses and the design of the ALU depend on the instruction set of the computer. The microprogrammed implementation described here is patterned after an implementation described in Andrew Tanenbaum's Structured Computer Organization, Third Edition (1989.)

Memory

The Little Man Computer, after which TLC is patterned, could hold three-digit integers. In two's complement binary, that needs eleven bits, which is a distinctly odd size for a computer word. TLC was designed with a twelve-bit word. That may seem like an odd size as well, but the very successful DEC PDP-8 had twelve-bit words. Data words use twelve bit two's complement notation, giving TLC an integer range of -2,048 to +2,047. Instruction words have four bits of operation code and eight bits of address. The eight bit address gives TLC a theoretic capacity of 256 words of memory. Like many real computers, TLC is not "maxed out" with memory; only 128 words are "installed" in the simulated computer.

Memory is connected to the CPU by two buses and three control lines. The memory address bus is a unidirectional bus connected to the CPU's memory address register (MAR). The MAR specifies the address in memory to be accessed during a read or write. The memory data bus is a bidirectional bus connected to the memory data register (MDR). The MDR holds data to be written to memory on a write and receives data from memory on a read.

The three control signals are read, write, and presence detect. A read signal commands memory to deliver the contents from the address in the MAR to the memory data bus, and so the the MDR. A write signal commands memory to accept data from the MDR and write it to the location specified by the MAR.

The presence detect (PD) signal causes the memory system indicate the memory size by placing the highest usable memory address on the memory data bus; it will be available at the memory data register (MDR) in the next clock cycle. In a real computer, presence detect returns detailed information about the memory subsystem, including memory size, on a separate serial connection from the memory module. That information is used by the firmware and operating system. TLC uses presence detect to get the address of the stack pointer. In a real computer, the stack pointer is initialized as part of starting a process. Some computers have a stack pointer register to speed up access to the stack pointer.

With the exception of presence detect, memory operations require two clock cycles. Memory timing is discussed below.

Data Path: Registers and Buses

The data path of TLC is shown in Figure 2. The data path of a computer is the arithmetic and logic unit (ALU), the registers, and the buses that connect them. The instruction decoder (I-decoder) / control unit is also shown in the figure. It is discussed in a later section.

Diabram of TLC's data path. — The data path of TLC.

TLC has five registers. The memory address register (MAR) holds one memory address to be used in a memory read or write operation. The MAR is an eight-bit register, allowing TLC to address 256 words of memory. It can be loaded from the C-bus and is connected continuously to the memory subsystem. The memory subsystem only uses this address when commanded to read or write.

The memory data register (MDR) is the same size as the TLC's word size: twelve bits. It can be loaded from the C-bus or driven onto the A-bus. It can also send data to memory on a memory write, or be loaded with data from memory on a memory read.

The accumulator receives the results of arithmetic or logical operations. It is a twelve-bit register. It can drive the A- or B-bus and can be loaded from the C-bus.

The program counter holds the address of the next instruction to be executed. It is an eight bit register that can be loaded from the C-bus and enabled onto the A-bus.

The instruction register holds the instruction currently being executed. It is a twelve-bit register that receives twelve-bit quantities when loaded from the C-bus. When the instruction register is enabled to the A-bus, only the rightmost eight bits are transmitted; the high-order four bits are filled with zeros. The leftmost four bits of the instruction register, corresponding to the operation code of a TLC instruction word, are connected continuously to the I-decoder and control unit.

All three main buses are twelve bits wide. The C-bus can load more than one register simultaneously, although that capability is not used in TLC. Only one register can be enabled to the A-bus and only one to the B-bus during any cycle.

Arithmetic / Logic Unit

The instructions of TLC can be executed with only four ALU functions:

Add: The addends are enabled on the A- and B-buses; the sum is returned on the C-bus.
Subtract: The subtrahend (in MDR) is enabled on the A-bus and the minuend (in Acc) is enabled on the B-bus. The difference is returned on the C-bus.
Copy: The value on the A-bus is copied unchanged to the C-bus.
Increment: The value on the A-bus is incremented and the new value is returned on the C-bus.

Digital logic diagram of one bit of the ALU. — One bit-slice of TLC's ALU. The ALU is composed of twelve such bit slices.

An ALU to compute those functions can be formed from one full adder per bit with three extra gates (two and gates and an exclusive or gate) that provide for control inputs. One bit slice of such an ALU is shown in Figure 3. The Ena A (enable A) control input causes the value on the A bus to be passed to the ALU and adder. The Inv A (invert A) control input causes the value on the A-bus to be inverted before being passed to the full adder. The Ena B (enable B) determines whether the value from the B-bus is passed to the ALU and adder.

The final ALU is composed of twelve instances of the circuit of Figure 3. A fourth control, increment, is connected to the carry-in bit of the rightmost bit slice. A four-bit configuration of such an ALU is shown in Figure 4.

Four bit ALU showing linkage of bit slices. — Bit-slice components of the ALU are linked together as shown.

The four required functions of the ALU are produced with the control signals. In addition, the ALU can emit constants zero, one, and minus one, also shown.

Addition is straightforward; A is enabled, not inverted and B is enabled. For subtraction, the bits of the A-bus, which must be the subtrahend, are inverted by the Inv A control. The Increment control provides for the addition of one to form the two's complement of the value on the A bus, which is added to the minuend on the B-bus. If only A is enabled, the value on the A-bus is copied unchanged to the C-bus. If Enable A and Increment are enabled, the value on the A-bus is incremented (by one) and the result is passed to the C-bus.

	Ena A	Inv A	Ena B	Incr
Add	•		•
Subtract	•	•	•	•
Copy	•
Increment	•			•
Zero
One				•
Minus one		•

If none of the four controls is enabled, the output is constant zero. If only Increment is enabled, the output is constant one. If only Invert A is enabled, the output is all ones, a two's complement minus one.

The V (overflow) signal is asserted when the carry in to the leftmost bit is different from the carry out of the leftmost bit. The P (positive) signal is the inverse of the leftmost bit, which is the sign bit of a two's complement number. So, P is asserted when the value produced by the ALU is non-negative. The Z (zero) signal is asserted when all bits of the ALU result are zero.

P/Z Latch

The arithmetic and logic unit produces a P (positive) signal that is the inverse of the sign bit of the ALU output, so it is a one when the ALU output is non-negative and a zero when the output is negative. It also produces a Z (zero) signal that is the nor of all 12 result bits, and so is a one when the output of the ALU is zero. These signals are input to the P/Z latch. The P/Z Latch is enabled to write concurrently with the accumulator register, so the P and Z outputs always reflect the state of the accumulator contents. The output of the P/Z latch is input to the I-decoder and control unit, providing a mechanism to test the state of the accumulator without requiring an additional data-path cycle.

I-Decoder and Control Unit

TLC's control unit is microprogrammed. The instruction decoder and control unit must generate 17 control signals, as shown in the table. In addition, we need two bits to control branching in the microprogram, one bit to control use of the op code as a branch target, and eight bits of address for jumps. Each microprogram word is 28 bits; for simplicity, we would use a ROM with 32 bit words because these are likely to be commercially available. The layout of the microprogram requires 256 words of ROM, although not all of them are used. The control signals that must be generated are these:

Data Path Control Signals
ALU	4	Enable A (EnaA); Invert A (InvA); Enable B (EnaB); Increment (Inc)
Memory	3	Read, write, PD
A-bus	4	Select one of four registers to enable onto the A-bus
B-bus	1	Indicates that Acc is to be enabled onto B-bus.
C-bus	5	Selects any of five registers for write from the C-bus
Microprogram Controls
Op Code	1	If 1, the operation code from the instruction register, shifted left 4, is used as branch target
Jump	2	00=no jump; 01=jump if positive (jp); 10= jump if zero (jz); 11=unconditional jump (ju)
Next Addr	8	If jump is non-zero, the address of the next microinstruction

Two of the possibilities for the jump control test the bits pf the P/Z Latch, which reflect the the current contents of the accumulator.

Tanenbaum (1989) described a notation for representing microcode which he called the micro assembly language, or MAL. The two tables below show the add and brp instructions in a notation similar to Tanenbaum's MAL, and with the actual bits of the microprogram. The complete microprogram is here.

Operation (for add, op code 1	Loc	ALU				Mem			Bus control										Op Cd	Jmp	Next Addr
Operation (for add, op code 1	Loc	ALU				Mem			A				B	C					Op Cd	Jmp	Next Addr
		EnaA	InvA	EnaB	Inc	Read	Write	PD	MDR	Acc	PC	IR	Acc	MAR	MDR	Acc	PC	IR
PC → MAR; rd	00	•				•					•			•
PC+1 → PC	01	•			•						•						•
MDR → IR; op	02	•							•									•	•
IR[addr] → MAR; rd	10	•										•		•
(wait)	11
Acc + MDR → Acc;	12	•		•					•				•			•				ju	00

Operation (for brp, op code 8)	Loc	ALU				Mem			Bus control										Op Cd	Jmp	Next Addr
Operation (for brp, op code 8)	Loc	ALU				Mem			A				B	C					Op Cd	Jmp	Next Addr
		EnaA	InvA	EnaB	Inc	Read	Write	PD	MDR	Acc	PC	IR	Acc	MAR	MDR	Acc	PC	IR
PC → MAR; rd	00	•				•					•			•
PC+1 → PC	01	•			•						•						•
MDR → IR; op	02	•							•									•	•
jump positive 82	80																			jp	82
jmp 00	81																			ju	00
IR[addr] → PC;	82	•										•					•			ju	00

Operation code zero (hlt) is handled with digital logic as a special case. A zero in the operation code field is detected using a four-input nor. The output is anded with the Op Code bit of the control word. A result of true stops the processor clock and so stops execution of the microprogram.

Layout of control word. — Arrangement of a control word.

Figure 5 shows the layout of a control word in TLC's microprogram control store. The bits are shown in the order they were discussed above. Tanenbaum (1989) pointed out that they would probably be arranged in a way that minimized crossing of conductors when the CPU was laid out for a semiconductor die. That's an engineering detail that need not concern us while we are working at the level of logical design.

Stack

An extension to the LMC instruction set provides for a stack that grows downward from the highest memory address. The highest memory address is used as a stack pointer. It is initialized with its own location when a program is loaded or the machine is reset. That is, address 0x7F is loaded with the value 0x7F. The call instruction decrements SP and stores the program counter at the location pointed by SP. The ret (return) instruction places the value at location SP into the program counter and increments SP.

The location at the highest memory address can be used for program storage provided the call and ret instructions are not used in such a program.

Timing

The most important thing about understanding timing in TLC (and real computers) is that things do not happen instantly. Computation with digital logic introduces gate delays, and even sending a signal from one part of the CPU to another isn't instantaneous because the signals travel no faster than the speed of light. The purpose of a CPU's clock is to allow enough time for signals to travel through the gates and buses to perform the desired computations. For real computers, clock speeds are measured in gigahertz: billions of pulses per second. For TLC you can adjust the clock speed from a pulse every couple of seconds to several pulses per second. The idea is to make TLC's clock slow enough for you to observe what is happening.

Instruction Timing

The fetch/decode/execute cycle of the Von Neumann architecture means that each instruction consists of some number of individual steps. In TLC, each step is accomplished in one data-path cycle.

Data-path Timing

Timing of events in a single clock cycle of TLC.

The data-path of a computer comprises the registers, the ALU, and the buses that connect them. TLC completes one data-path cycle with every cycle of the computer's clock. Each instruction takes multiple data-path cycles, so each data-path cycle does part of the work of one instruction.

TLC uses an asymmetric clock; that means one part of the cycle, in this case clock-low, is longer than the other part, the clock-high part of the cycle. The clock-low part of the cycle must be long enough for generation of control signals, propagation of data on the buses, and computation by the ALU. By contrast, the clock-high part of the cycle need be only long enough for the registers to be loaded from the C-bus.

A clock cycle starts on the falling edge of the clock. The falling edge triggers the instruction decoder and control unit to set up the necessary signals. That takes a certain amount of time, shown as Δw in Figure 6.

The control signals include register-enable signals for those registers that are to put their contents on the A- and B- buses. The time for the registers to send their contents to the A- and B-buses, and for the signals to reach the ALU is shown as Δx.

The arithmetic-logic unit is combinational logic; it is computing continuously. It's outputs change in response to changes in the inputs. However, the output of the ALU is not valid until it has valid inputs, and for a time after than equal to the gate delay through the ALU. That time is shown as Δy in the figure. It then take time Δz for the output of the ALU to travel along the C-bus and be available at the inputs of the registers.

The last time band in the figure is labeled "Tolerance." Because of manufacturing variation, electronic devices manufactured identically will still be slightly different. The allowance for tolerance means that an instance of this CPU that happens to be slightly slower than the design specification will still work correctly.

By the end of the clock-low portion of the cycle, the result of the current computation has propagated through the C-bus and is available at the inputs of the registers. One or more registers will be selected by the "register enable" signals from the I-decoder and control unit to receive the results, and the results will be stored in those registers on the rising edge of the clock. (Usually a result is stored in only one register, but it is possible to store the same result in more than one.)

Notice that no signals are necessary to trigger operations between the falling edge of the clock and the next rising edge. It is only necessary to hold the clock in the low state for long enough to allow propagation of signals through the buses and the ALU.

Memory Timing

In a real computer, memory is many times slower than the CPU. For a computer with a four GHz clock and 15 ns memory, memory is about 60 times slower; the clock will pulse 60 times before memory delivers a result. To compensate, real computers implement cache memory. Most memory requests can be satisfied from a small, fast cache memory that is only about two to ten times slower than the CPU.

To show that the CPU must often wait on memory, but to keep waiting time from being so long that the simulation is useless, TLC requires two clock cycles for a memory access. That is, if a memory read is commanded in clock cycle one, the result is not available in the memory data register until the beginning of clock cycle three. Whenever possible, the control unit does useful work while waiting for memory. For example, the program counter is incremented while waiting for memory to deliver an instruction in the "fetch" part of the cycle. Otherwise, the control unit executes a no-operation cycle while waiting on memory.

Input and Output

The in and out instructions are executed "behind the scenes" by the simulator. The out instruction completes in four cycles; the explanation might be that the IO subsystem always has a buffer ready for output and can accept in one clock cycle. The clock is stopped while the in instruction executes. No detail of the I/O process is exposed by the simulator when the LMC I/O instructions are used.

TLC can also do memory-mapped I/O, which exposes programmed I/O with busy waiting to the simulated program. TLC uses the three highest possible memory addresses, 0xfd, 0xfe, and 0xff as registers for I/O. Address 0xfd is the I/O address register, 0xfe is the I/O status register and address 0xff is the I/O data register.

The programmer must store a device address in 0xfd; the input device is 0x000 and the output device is 0x001. For output, the data word to be written must be stored at address 0xff. For input, the value read will be placed in address 0xff. After the registers are set up, the programmer issues the pio instruction. The program must then loop, testing address 0xfe until it becomes one, indicating the completion of the I/O operation. Any input or output operation attempted before 0xfe becomes one will cause a machine check.

Operation	Loc	ALU				Mem			Bus control										Op Cd	Jmp	Next Addr
Operation	Loc	ALU				Mem			A				B	C					Op Cd	Jmp	Next Addr
		EnaA	InvA	EnaB	Inc	Read	Write	PD	MDR	Acc	PC	IR	Acc	MAR	MDR	Acc	PC	IR
PC → MAR; rd	00	•				•					•			•
PC+1 → PC	01	•			•						•						•
MDR → IR; op	02	•							•									•	•

IR[addr] → MAR; rd	10	•										•		•
(wait)	11
Acc + MDR → Acc;	12	•		•					•				•			•				ju	00

References

Tanenbaum, Andrew (1989). Structured Computer Organization, Third Edition. Upper Saddle River, NJ, Prentice-Hall.

Tanenbaum, Andrew (2006). Structured Computer Organization, Fifth Edition. Upper Saddle River, NJ, Prentice-Hall.

SUB Instruction
Cycle	Registers	Memory
1	PC → MAR	Read
2	PC + 1 → PC	Wait
3	MDR → IR
4	IR[address] → MAR	Read
5		Wait
6	Acc - MDR → Acc

STO Instruction
Cycle	Registers	Memory
1	PC → MAR	Read
2	PC + 1 → PC	Wait
3	MDR → IR
4	IR[address] → MAR
5	Acc → MDR	Write
6		Wait

TLC: The Tiny Little Computer

TLC: The Tiny Little Computer

Table of Contents

Introduction

Assembler and Instruction Set

Format of Assembly Language Statements

Instruction Set

Extended Instructions (Not yet implemented)

Operation

Architecture and Micro-Architecture

Timing

References