March  23, 2003

Prescott / Nocona doubles in flight uOps 

( by Hans de Vries )


Editors note: April 23, 2003

This Article has been superseded by a newer version which you can find in here:

 Looking at Intel's Prescott die, part II


More uOps in flight improves the performance of  hyperthreaded applications


We already saw in our previous article that many buffers on Intel's Prescott's die have increased significantly in size. These larger buffers are needed to support more uOps (micro-operations)  that can be in-flight in the micro architecture. The Pentium 4 can have a total of 126 instructions in-flight to allow extensive out of order processing. Hyper threading increases the need for in-flight uOps. A Pentium 4 with two threads limits the number of in-flight uOps  to 63 for each thread thereby reducing the out of order capabilities and thus the performance. It now looks like that the new Prescott/Nocona will double the number of  in-flight uOps to 256. 

The RAT history table in the Free List Manager


The size of this table gives a clue of how many uOps can be really in flight. It remembers the contents of the RAT (Register Alias Table) for all in-flight uOps so that it can wind back  in time in case of an event that interrupts processing. The RAT has pointers for each Architectural x86 register into the RF (Renamed Register File). The uOps write their result data and status Flags in this RF. Each uOp has its own entry allocated in the RF, so the RF in the Pentium 4 has 126 entries. On the Prescott die however it looks that we can see two integer cores, each having its own 256 entry RF. This doesn't mean however that there can be 512 in-flight uOps. .


The RAT history table however does give a conclusive answer! 


Measurements and Calculations:


Measured sizes of the RAT history table ("counting" pixels):


Northwood 130 nm:               0.731 mm x 0.477 mm   =   0.349 mm2

Prescott 90 nm:                      0.830 mm x 0.457 mm   =   0.379 mm2

"Scaled" Northwood 90 nm:   0.506 mm x 0.330 mm   =   0.167 mm2     (calculated)


So we see that the RAT history table on the 90 nm Prescott is slightly larger then the one on the 130nm Northwood

And then now the calculations:  If the Prescott can have 256 uOps in fly then it should be a factor 256/126 larger.

Further more. If the RF(s) have 256 entries then the pointers grow from 7 to 8 bit, making it an extra factor 8/7 lager. So, If  Northwood's RAT history table would linearly scale from 0.349 mm2 with a factor (9/13)^2 to 0.167 mm2then we want to see a Prescott version with a size of more or less   0.167 mm2 * 8/7 * 256/126 = 0.388 mm2


Calculated versus Measured is 0.388 mm2 : 0.379 mm2  giving a ratio of 1.023


That is very close indeed given assumed linear scaling from 130 nm to 90 nm, cutting out vague blue and artificial colored rectangles and the not entirely correct assumption that memories scale in size proportional with the number of entries or bits.   ( We should suspect that error in measurement is significantly larger then the 2.3% so we must have had some luck here )  


Support for 4 threads in Nocona


Nocona will support four threads if our observation of two 256 RFs holds. It means that each one can be used for two threads giving a total four threads, however the maximum uOps in flight supported is 256. (edit March 25: There are two cores each with it's own RF but it looks likes the second can not run an independent thread, See article: The clues for Yamhill)