DOI: 10.3724/SP.J.1016.2009.00142

Chinese Journal of Computers (计算机学报) 2009/32:1 PP.142-151

High-Bandwidth Memory Accessing Pipeline of General Purpose Processor

There is a near-exponential increase in processor speed and memory capacity. However, memory latencies have not improved as dramatically, and access times are increasingly limiting system performance. Low load-to-use latency is a key to approach high memory performance, and increasing the bandwidth of memory pipeline always works. But high bandwidth brings more complexity and needs more power. The authors' work was based on the analysis of the applications, and intend to find the head room of the performance of the memory pipeline. The authors find some useful characters of memory operations was found and give an optimized design of high bandwidth memory pipeline, which has low complexity, low latency and low power. The decisions are used to instruct the design Godsonx processor, although the bandwidth of memory access is doubled and the performance is increased by 8.6%, the extra area is only 1.7% of the original design.

Key words:high bandwidth,memory pipeline,cache,TLB

ReleaseDate:2014-07-21 14:41:22

[1] Saulsbury A, Pong F, Nowatzyk A. Missing the memory wall: The case for processor/memory integration//Proceedings of the 23rd Annual International Symposium on Computer Architecture(ISCA'96). Philadelphia, PA, USA, 1996: 90-101

[2] Wulf Wm A,McKee Sally A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24

[3] Yeager Kenneth C. The MIPS R10000 superscalar microprocessor. IEEE Micro,1996, 16(2): 28-41

[4] Compaq Corp. Alpha 21264 Microprocessor Hardware Reference Manual. 1999

[5] Hu Wei-Wu,Zhang Fu-Xin,Li Zu-Song. Microarchitecture of the Godson-2 processor. Journal of Computer Science and Technology,2005, 20(2):243-249

[6] Doweck Jack. Intel smart memory access: Minimizing latency on Intel core microarchitecture. Intel Technology Magazine,2006

[7] Agarwal Amit,Roy Kaushik, Vijaykumar T N. Exploring high bandwidth pipelined Cache architecture for scaled technology//Proceedings of the Design, Automation and Test in Europe Conference and Exhibition-2003(DATE'03).Munich, Germany, 2003: 778-783

[8] Rivers Jude A, Tyson Gary S et al. On high-bandwidth data Cache design for multi-issue processors//Proceedings of the 30th International Symposium on Microarchitecture(MICRO-30).Research Triangle Park, North Carolina, USA, 1997: 46-56

[9] Kessler R E. The Alpha 21264 microprocessor. IEEE Micro, 1999,19(2): 24-36

[10] Edmondson John H et al. Internal organization of the Alpha 21164: A 300-MHz&l-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, Special 10th Anniversary Issue,1995, 7(1): 119-135

[11] Keltcher Chetana N, McGrath Kevin J et al. The AMD opteron processor for multiprocessor servers. IEEE Micro, 2003, 23(2): 66-76

[12] Lee Chih-Chieh, Uhlig Richard A, Mudge Trevor N. A case study of a hardware-managed TLB in a multi-tasking environment. University of Michigan, Ann Arbor, MI, USA: Technical Report, CSE-TR-341-97 1997

[13] Chen J Bradley,Borg Anita,Jouppi Norman P. A simulation based study of TLB performance//Proceedings of the 19th International Symposium on Computer Architecture (ISCA-19). Gold Coast, Australia, 1992: 114-1

[14] Cekleov Michel,Dubois Michel. Virtual-address caches, Part I: Problems and solutions in uniprocessors. IEEE Micro,1997, 17(5): 64-71

[15] Hennessy John L,Patterson David A. Computer Architecture: A Quantitative Approach. 3rd Edition. Amsterdam, Netherlands: Elsevier Science Pte Ltd, 2003

[16] Fan Dong-Rui,Tang Zhi-Min et al. An energy efficient TLB design methodology//Proceedings of the International Symposium on Low Power Electronics and Design-2005(ISLPED'05). San Diego, California, 2005: 351-356

[17] Min Jung-Hi,Lee Jung-Hoon et al. A Selectively Accessing TLB for High Performance and Lower Power Consumption//Proceedings of the APASIC'02. Taipei, China, 2002: 45-48

[18] Lee Jung-Hoon,Park Gi-Ho et al. A selective filter-bank TLB system//Proceedings of the ISLPED'03. Seoul, Korea, 2003: 312-317

[19] Heinrich Joe. MIPS R4000 Microprocessor User's Manual. Mountain View, CA ,USA: MIPS Technologies, Inc.,1994

[20] Austin Todd M, Sohi Gurindar S. High-bandwidth address translation for multiple-issue processors//Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA-23).Philadelphia, USA, 1996:158-167

[21] Intel Corp. IA-32 Intel Architecture Software Developer's Manual, Vol.1: Basic Architecture.Silicon Valley, USA: Intel Corp., 2003

[22] Henning John L. SPEC CPU2000: Measuring CPU performance in the new millennium.IEEE Computer, 2000, 33(7): 28-35

[23] Wilson Kenneth M,Olukotun Kunle,Rosenblum Mendel. Increasing Cache port efficiency for dynamic superscalar microprocessors//Proceedings of the 23rd International Symposium on Computer Architecture(ISCA-23). Philadelphia, USA, 1996: 147-157

[24] Tendler J M, Dodson J S et al. POWER4 system microarchitecture. IBM Journal of Research and Development, 2002, 46(1): 5-25