March 12, 1958

Kolsky Memo 2

Project 7000

File Memo

Subject: Second Report on Results of SIGMA Timing Simulator Program

# 1. Introduction

To evaluate the performance of an asynchronous computer such as SIGMA one must get down to the detailed interaction of the components under typical operating conditions.

The first report on this subject (Project 7000 File Memo dated 2/6/58) describes how we have attempted to make quantitative measurements of the performance of the SIGMA computer using a Timing Simulator code written for the 704.

This report lists the studies which have been done since the last report. Most of them have been directed toward evaluating the effect of the recent proposed redesign of the indexing Arithmetic Unit and its interrelationship with the more realistic Arithmetic Unit Times now being quoted.

The effort of the past month has been in the direction of obtaining results as soon as possible, not in making the simulator a more precise mirror of the SIGMA machine. The results must, therefore be considered as approximate in details although the large trends should be essentially correct.

### 2. Test Problems Used

The main test problems used continue to be the mesh calculation and the Monte Carlo Calculation described in the February 6th memo. The Mesh calculation was used for the main IAU-AU studies since it uses a more or less "normal" distribution of indexing and arithmetic operations. (Reference: File Memo Dated March 5, 1958).

In addition to these, a few runs were also made on three problems which have been used by others in inter-comparing IBM machines with the LARC, the TRANSAC, etc. They are:

£ .

4

 $\mathbf{Z}_{\mathbf{t}}$ 

Tabauan

- (1) The Westinghouse Reactor problem The calculation of the inner loop of the numerical solution of a neutron diffusion equation. It is very heavy on arithmetic, very little logic.
- (2) Ziller's Transac Test problem The evaluation of a polynominal using computed indices.
- (3) <u>Matrix Inversion</u> The inner loop of a matrix inversion routine. Arithmetic and logic are approximately of equal importance. The shortness of the loops makes effective multiplexing difficult.

#### 3. Designs Studied

The chief differences between the "standard design" described in the February 6th memo, and the designs being studied in this report are in the indexing Arithmetic Unit, the arithmetic unit times and the inclusion of index core memory. For convenience the other items which were not changed are also included in the following list:

The "January" and "February" Designs:

- a. Machine Components:
  - 1. Levels of look-ahead
  - 2. Number of Instruction Memories
  - 3. Number of Main (data) Memories
- b. Computer Speeds:

|    |                         | January | rebruary |
|----|-------------------------|---------|----------|
| 1. | Indexing Time*          | 1.45 us | 0.9 us   |
| 2. | Arithmetic Unit Times** |         |          |
|    | Fl Add                  | 1.2 us  | 1.0 us   |
|    | Fl Mpy                  | 2.5 us  | 1.7 us   |
|    | Fl Div                  | 4.0 us  | 2.7 us   |
|    | Fetch                   | 0.6 us  | 0.6 us   |
|    | usual 6-6-3-1 average   | 1.40 us | 1.09 us  |

- \*This is total time to index one order, includes instruction decoding, index fetch, index addition, and storing modified address. The January number is actually an average for a two-instruction full word of 1.9 us for the first instruction and 1.0 us for the second instruction. The February number assumes an even 0.9 us rate for the first and second instructions.
- \*\*The January AU times are unofficial estimates. The February AU times are those recommended by S. W. Dunwell in a memo dated February 14, 1958. The fetch time is actually Hamming check time.

March 12, 1958

# c. Memory Speeds:

| 1. | Fast (Instr.) Memory Times |           |
|----|----------------------------|-----------|
|    | Read out time              | 0.4 usec. |
|    | End signal time            | 0.4 usec. |
|    | Memory cycle time*         | 0.6 usec. |

\*(The actual effective cycle time is 0.9 usec. since the bus clocking permits successive references to the same memory box only in multiples of 0.3 usec and the memory box must be free at the time of the reference not just finishing.)

| 2. | Main (Data) Memory Times |           |
|----|--------------------------|-----------|
|    | Read out time            | 0.8 usec. |
|    | End signal time          | 1.7 usec. |
|    | Memory cycle time*       | 2.0 usec. |

\*(The effective cycle is 2, 1 us for same reason as above).

| 3. | Index Core Memory Times |           |
|----|-------------------------|-----------|
|    | Read out time           | 0.4 usec. |
|    | Memory cycle time       | 0.8 usec. |

The index cores are assumed tied directly to the IAU, so these figures include bus times.

#### Bus Speeds

4.

- Buses to and from Instruction and Data memories 0. 2 usec slot (either read or write) available every 0. 3 usec.
- 2. Decode and switching time in central control unit 0. 2 usec to 0. 4 usec (depending on bus slots available.)

Note: A separate bus system to instruction and Data memories is assumed.

In addition there is usually a 0.1 usec delay between the completion of any function and the beginning of the next one by the unit, or in the transfer from one register to another.

#### FILE MEMO

#### 4. Results and Conclusions:

A list of the parameter studies run since the February 6th memo are given in Appendix I.

Appendix II consists of graphs of some of the runs showing the variation of SIGMA computing speed vs. various parameters. In each case the speed is in terms of a 704 version of the same problem.

(a) SIGMA performance for various problems

Table 1 lists the speed of SIGMA on the five problems which have been tried to date. One striking feature is the range of speeds which appear — from 40. to 86. for the improved times. This points up the difficulty of giving a single speed performance figure. It also indicates that SIGMA is not just a "speeded-up 704", but a machine with considerably different organization.

SIGMA shows the biggest improvement over 704 in the problems which are largely floating arithmetic - - Mesh, Westinghouse, and Transac Test. It shows less improvement for the problems involving logic and indexing - - Monte Carlo and Matrix Inversion. (See graph 1)

(b) The effectiveness of the February Improvements in the IAU

All the problems showed an improvement as a result of the February improvements in the Indexing Arithmetic Unit. Those heavy on indexing naturally showed the most. (See table 1)

The variation of speed vs. IAU times for various Arithmetic times are shown in graphs 2 and 3. The important point to notice is that although the changes in AU and IAU are each worth about 10% in speed separately, taken together they make a 30% improvement. Graph 4 shows the AU efficiencies as a function of the AU and IAU times.

(c) The effect of Instruction Memory Speed

As was found in the previous runs, the Monte Carlo problem with its frequent branching is more sensitive to the instruction memory speed than the Mesh Calculation. However, with the present speeds, as contrasted to the higher "standard" speeds used before, the performance is only about half as sensitive to the change in memory speed as it was. (10% decrease instead of 22%). The positive effect of having more instruction memory does not appear in these figures. (d) The effect of the number of Instruction Memory Boxes

Graph 5 shows rather conclusively that there appears to be no gain beyond two instruction memory boxes for these arithmetic speeds.

(e) The effect of the number of Main (Data) Memory Boxes Graph 6 shows that the performance has become less sensitive to the number of Main Memories than was true using the "standard" speeds. There is still a pronounced loss if one mixes data and instructions, however.

# (f) The effect of changing divide speed only

Because of interest expressed in the importance of divide speed alone, several runs were made with different divide times assumed. The results are that divide reduces the speed about the same as the change in the 6-6-3-1 average arithmetic time would predict. For example, changing the divide from 2.7 us to 9.0 us changes the average AU time from 1.09 us to 1.48 us which from Graph 3 implies a speed of 61., whereas the actual run gave 62.

(g) The effect of number of levels of look-ahead

Graph 7 shows the performance vs. number of levels of look-ahead for 4 Main memories and 1 main memory. The speed continued to rise past 4 levels but the gains become relatively small.

(h) The effect of Index Core Memory times

The use of a small core array for index register has been included in the Simulator since the previous runs. Graph 8 shows the effect of various assumed cycle times on the Mesh Calculation for three sets of Arithmetic speeds. Here again, the performance is less sensitive to core cycle times when the arithmetic speeds are low than when they are high.

The 0.8 us cores themselves seem to cause about a 11% reduction in performance from that of 0.3 us transistor registers at the February speeds. However, they also have the insidious effect of discouraging other improvements which might be possible now or later but which would be **masked** by the core cycle times if we put index cores in SIGMA.

-5-

- (i) The effect of varying the 2.0 us memory read out time Graph 9 shows that the performance is decreased by later read-out times, but that is not a strongly varying function for small changes.
- (j) The effect of simultaneous Input-Output upon computing speed A series of runs was made varying the average I/O word rate while the regular program was running. The Simulator assumes that a high speed disk is storing words in consecutive memory locations taking priority over other memory references. The effect on the Mesh Calculation was surprisingly small. The Mesh Calculation is favorable case, since the index registers are used in it to hold all intermediate results and these are not disturbed by the disk. More cases should be examined before making further generalization.

## 5. Summary

The improvements proposed since the January estimate are certainly very worth while. However, the performance is still about a factor of two below that expected in the Los Alamos contract.

The SIGMA system becomes percentage wise less sensitive to all variations when its speed is low than when it is high. We must be careful not to let this apparent insensitivity encourage us to drop 5% here and 5% there as being unimportant. The SIGMA system is very non-linear and these losses can add up to considerably more than 10%. Conversely we do not want to freeze one part of the machine at a low level which does not matter now but may block the effects of future gains elsewhere in the system.

Harwood & Holoky

Harwood G. Kolsky Representative Product Planning Goordinator Project 7000

Coche The

John Cocke Staff Engineer Project 7000

HGK: JC: jcv

いい痛

1

Table 1: Summary of Main Effects on Computer Speed.

Unless otherwise stated runs were made with 4 Data Memories 2.0 us, 2 Instruction Memories, 0.6 us, 1 Index Core memory, 0.8 us and 4 levels of look-ahead.

|           | Description of Run                                                            | SPI                | EED        |              |
|-----------|-------------------------------------------------------------------------------|--------------------|------------|--------------|
| ,         |                                                                               | Jan. Est.          | Feb. Imp.  | % chang      |
| 1.        | Effects of IAU time change only                                               | 62.                | 73.        | . 1.0        |
|           | <ul> <li>(a) Mesh Calc (using AU=1.09 us)</li> <li>(b) Monte Carlo</li> </ul> | 29.                | 40.        | +18.         |
|           | •••                                                                           |                    |            | +36.         |
|           | (c) Westinghouse Reactor Calc.                                                | 83.                | 86.        | + 4.         |
|           | (d) Transac Test Problem                                                      | 64.                | 73.        | +13.         |
|           | (e) Matrix Inversion<br>Average                                               | 35.<br>55.         | 44.<br>63. | +26.<br>+15. |
| 2.        | Effect of LAIL and All changes accorded                                       | the share of the h | Carad      | <i>M</i> - 1 |
| 4.        | Effect of IAU and AU changes separately                                       |                    |            | % chan       |
|           | (a) Jan. Est. Times: I = 1.45 us, AU                                          |                    | 56.        | 0            |
|           | (b) $I = 0.9 \text{ us, AU}$                                                  |                    | 62.        | +10.         |
|           | (c) $I = 1.45 \text{ us}, \text{ AU}$                                         |                    | 62.        | +10.         |
| •         | (d) Feb. Imp Times: I = 0.9 us, AU                                            |                    | 73.<br>EED | 30.          |
| 3.        | Effect of Instruction Memory Speed                                            | 0.6 us FM          | 2.0 us FM  | % chang      |
|           | (a) Mesh Calc (with $I=0.9$ us, $AU=1$ .                                      |                    | 71.5       | - 2.         |
|           | (b) Monte Carlo                                                               | 40.                | 36.        | -10.         |
| <b>.</b>  | Effect of changing Divide Speed only (IAU                                     | (=0.9  us)         | Speed      | % chang      |
|           | (a) Mesh calc. with 1.0 us Divide                                             |                    | 75.        | + 3.         |
|           | (b) Mesh calc. with 9.0 us Divide                                             |                    | 60.        | -18.         |
| 5.        | Levels of Look-ahead (IAU=0.9, AU=1.0                                         | 9)                 | Speed      | % chang      |
|           | (a) Mesh Calc with 3 levels                                                   |                    | 69.        | -5.6         |
|           | (b) Mesh Calc with 5 levels                                                   |                    | 74.        | +2.3         |
| <b>5.</b> | Effect of Varying No. of Instruction Mem                                      | ories              | Speed      | % chang      |
|           | (a) Mesh calc with 1 0.6 us FM                                                |                    | 73.        | 0            |
|           | (b) Mesh Calc with 1 2.0 us FM                                                |                    | 64.        | -12,         |
| 7.        | Effect of Varying X-Core cycle times                                          |                    | Speed      | % chang      |
|           | (a) Mesh calc with 0.4 us cores                                               |                    | 81.        | +10.         |
|           | (b) Mesh calc with 0.2 us cores                                               |                    | 83.        | +14.         |
| 8.        | Effect of Varying 2.0 us Data mem. read                                       | l out time         | Speed      | % chang      |
|           |                                                                               | 0.8 us RO          | 1.2 us RO  | i            |
|           | (a) Mesh calc with 2.0 us FM                                                  | 72.                | 71.        | - 1.         |
|           | (b) Monte Carlo with 2.0 us FM                                                | 36.                | 35.        | - 3.         |
| ).        | Effect of I/O memory interference                                             |                    | Speed      | % chang      |
|           | (a) Mesh calc with I/O storing every                                          | 8.0 us on avera    | age 71.    | - 3.         |
|           | (b) Mesh calc with I/O storing every 2                                        |                    | 66.        | -10.         |

이다. 2월 월일: 11일: 월달: 11일

### APPENDIX I

SIGMA Timing Simulator Runs Made February 4 to March 5, 1958

#### For Mesh Calculation

- 1. Varying X-Core times: 0.1, 0.2, 0.4, 0.6, 0.8 usec. with AU = 0.64; IAU = 0.6
- 2. #1 with AU = 1.40; IAU = 1.75
- 3. AU = 1.40; IAU = 1.75, X cores and 2.0 us memory
- 4. #3 with no IAU buffer
- 5. AU = 1.09, IAU = 1.75, Transistor X-register
- 6. Varying IAU times for 1.09 usec. AU, X-cores, 2.0 us FM IAU = 0.8, 0.9, 1.0, 1.15, 1.25, 1.35, 1.45, 1.55, 1.75
- 7. Varying AU times and IAU times, X-cores, 0.6 us FM AU = 0.29, 0.51, 0.79, 1.09, 1.35, 1.63 for IAU = 0.8, 0.9, 1.0, 1.1, 1.2, 1.4
- 8. Varying divide time only: 1, 3, 5, 7, 9 usec.
- 9. Varying No. Look-Aheads for February times: 1, 2, 3, 4, 5, 6, 7, 8
- 10. Varying I/O time: 16, 12, 08, 4, 2 usec. rate
- 11. Varying X Core times for February times: 2, 4, 6, 8 usec. cycle
- 12. January and February IAU for 1.09 us AU time
- 13. Varying No. Instruction Memories for 0.6 us and 2.0 us FM: 1, 2, 3, 4
- 14. Varying No. Data memories for 0.6 us and 2.0 us FM:1, 2, 3, 4, 5, 6, 7, 8
- 15. Data and Instruction both in MM: 1, 2, 3, 4, 5, 6, 7, 8
- 16. With 0.6 us MM, Data and Instruction both in MM: 1, 2, 3, 4, 5, 6, 7, 8
- 17, Varying No. Data Memories and no. levels look-ahead
  No. MM's: 1, 2, 3, 4, 5, 6, 7, 8,
  for No. Look Aheads: 1, 2, 3, 4, 5, 6, 7, 8
- 18. Varying MM read-out time: 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 usec.

## For Monte Carlo Calculation

- 1. Varying X-Core times: 0.1, 0.2, 0.4, 0.6, 0.8 usec. with AU = 1.40 and IAU = 1.75
- 2. AU = 1.40, IAU = 1.75 with Transistor X-registers
- 3. AU = 0.64, IAU = 0.6 with LA = 4, 8 Transistor X-registers
- 4. AU = 1.09, IAU = 1.75 with LA = 4, 8 Transistor X-registers
- 5. Varying IAU times for 1.09 usec. AU, X-Cores, 2.0 us FM
  - IAU = 0.8, 0.9, 1.0, 1.15, 1.25, 1.35, 1.45, 1.55, 1.75
- 6. Varying AU times for 0.9 usec. IAU time, X-Cores, 2.0 us FM AU = 0.29, 0.51, 1.09, 1.35, 1.63
- 7. January and February IAU for 1.09 us AU
- 8. Varying No. Instruction Mems. for 0.6 us FM and 2.0 us FM: 1, 2, 3, 4
- 9. Data and Instruction both in Main Memory
- 10. Varying MM read out time: 0. 2, 0. 4, 0. 6, 0. 8, 1. 0, 1. 2, 1. 4, 1. 6, 1. 8, 2. 0 usec.

## For Transac Test Problem For Matrix Inversion

each run with January and

For Westinghous Reactor Problem February IAU for 1.09 us AU

| +                            | Appendix 2:                                                                                                                      | GRAPHS                                                                                        |
|------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
|                              |                                                                                                                                  | Graph I<br>SIGMA COMPUTER SPEED<br>VS Percentage of Flating<br>Point Operations Executed      |
|                              |                                                                                                                                  | VS Percentage of Flating<br>Brint Opentions 5                                                 |
|                              |                                                                                                                                  | using 4 Main mems 20 us                                                                       |
|                              |                                                                                                                                  | uning 4 Main Mems 20 us<br>2 Fast Mems 0.6 us<br>4 levels Look-ahead<br>1.99 us AU are. times |
|                              |                                                                                                                                  | 1 Index Neph, C. 8 MS                                                                         |
| 8                            |                                                                                                                                  |                                                                                               |
| Z                            |                                                                                                                                  |                                                                                               |
| <b></b>                      | 110-                                                                                                                             |                                                                                               |
|                              | 100                                                                                                                              |                                                                                               |
|                              |                                                                                                                                  | westinghouse Phole,                                                                           |
|                              |                                                                                                                                  |                                                                                               |
|                              |                                                                                                                                  | Transac<br>Test + +                                                                           |
| K<br>R<br>R                  | <b>D</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b><br><b>H</b> | мерр                                                                                          |
| N GRAPH P<br>INCH            |                                                                                                                                  |                                                                                               |
| PER ING                      | 50-<br>Monte Carlo                                                                                                               |                                                                                               |
| 10 DIETZGEN<br>10 X 10 PER 1 | to                                                                                                                               |                                                                                               |
| 0                            | 30                                                                                                                               | Matrix Zhu.                                                                                   |
| O<br>Z                       | 20                                                                                                                               |                                                                                               |
|                              |                                                                                                                                  |                                                                                               |
|                              |                                                                                                                                  |                                                                                               |
|                              | 0 20% 40%                                                                                                                        | ating Paint Ops. Executed                                                                     |
|                              | Percent Fla                                                                                                                      | ating Point Qos. Erecuted                                                                     |
|                              |                                                                                                                                  | HGK                                                                                           |

ţ.

• ...



340

ğ



00 EUGENE DIETZGEN Made in U. S. A.

IO DIETZGEN GRAPH PAPER 10 X 10 PER INCH



EUGENE DIETZGEN CO. MADE IN U. S. A.

IO DIETZGEN GRAPH PAPER 10 X 10 PER INCH



EUGENE DIETZGEN MADE IN U. S. A.

ö

IO DIETZGEN GRAPH PAPER 10 X 10 PER INCH

340

Ő

Graph 6 4 MPUTER SPEED SIGMA CO -+ --Main Memory VS. Number of 1 į 1 Bones 1.1 1 ; ļ various treatments for of . 1.4 1 Instruction Memory 1 1: 4 . 4 levels of look-ahead .1 ł . Are. AD fime = 1.09 MS Are.  $\pm AU$  fime = 0.9 MS  $1 \pm ndex$  Mem = 0.8 MS Mesh Calc. + 1 For 1 1 t į ŧ 1 old "standard" times +10 (transistor X-regs) ŧ + 4 -1 100 1 1 90 TAST Nen 0.6,45 with with 29,45 Instriment 60 4 70 1 1 1 11 SPEED R 1 60 with data + Mostris, both 2-0 pis Marti Mam. in 50 Ì 30 ł 24 10 .1 ф 3 5 Number of Main Memory Bares 1 HAK

00

EUGENE DIETZGEN MADE IN U. S. A.

PAPER

IO DIETZGEN GRAPH



PAPER -10 DIETZGEN GRAPH 10 X 10 PER INCH

ġ

Graph 8 SIGMA COMPUTER SPEED Index Core Memory Cycle TIMES for various Arithmetic Times 4 Main Mems 2.0,45 0.6,45 2 Fast Mams Levels of look-ahead Ave. AU time = 1.09 used ANE. IAU TIME = 0.7 USEC CIAU Time is actually changed by Val For Mesh Cale. 570 register time 0.3 ed 455 Index Time 0.8 A0-0.54 MG 0614 和华 (-16%) 405 6-11%) P.9 3345 79, AC=1.40, 540=1175 2 D. 6 0.8 Ø, Ø Index Core Memory -Total Cycle Time Kusec Total Cycle mme (Read-out is assured to be except for 0. This case.) one-halt HGK

CUGENE DIETZGEN MADE IN U. S. A.

Ö

IO DIETZGEN GRAPH PAPER

|        |            |                                        |                             | Graph<br>IGMA COMPUTER                                   | 9<br>SPEED            |
|--------|------------|----------------------------------------|-----------------------------|----------------------------------------------------------|-----------------------|
|        |            |                                        | 1                           | Graph<br>IGMA COMPUTER<br>S Read-Out Fin<br>Santa Memory | e a7                  |
|        |            |                                        |                             | ┼┾┿┥┾┽┥╇╶                                                | ╶┼┈┦╼┦╶╉╵╎╍╋╍╋┄╉╴┠╴┾╌ |
|        |            |                                        |                             | 2 FMems 2<br>4 Marin Mems.<br>1 I Jadex Mem              | 8,0,45<br>            |
|        |            |                                        |                             | ARE: AU TIME<br>Are: 240 Time<br>4 Levels of 1           | 1.09 US<br>0.9 MS     |
|        |            |                                        |                             | 4 Levels of 1                                            | ph-ahat               |
|        | - No-      |                                        |                             |                                                          |                       |
|        | 100-       |                                        |                             |                                                          |                       |
|        |            |                                        |                             |                                                          |                       |
|        | 80         |                                        |                             | i nash t                                                 | îate                  |
| d<br>S |            |                                        |                             | <b>••</b>                                                |                       |
|        | <i>60</i>  |                                        |                             |                                                          |                       |
|        | <b>50-</b> |                                        |                             |                                                          |                       |
|        |            | ┍ <del>┶┎</del> ╋╼═┱╋<br>╽╺╴╴╴╴╸╋╼╼╼╼╋ | ┽╾┼┾╆┽┍┶┾┿<br>╵╧╧┶╋┶╍╍╋╺╧╧╼ |                                                          | te Carlo              |
|        | 30 +       |                                        |                             |                                                          |                       |
|        |            |                                        |                             |                                                          |                       |
|        |            |                                        |                             |                                                          |                       |
|        | •          | 05 1.<br>Read-Out Time                 | 0 1.5<br>6f 2044            | Main (Data) Mei<br>Ned at 2.0 as)                        | non                   |

!

EUGENE DIETZGEN CO. MADE IN U. S. A.

NO 340 IO DIETZGEN GRAPH PAPER 10 X 10 PER INCH