图书介绍
并行计算导论PDF|Epub|txt|kindle电子书版本网盘下载
![并行计算导论](https://www.shukui.net/cover/55/32381976.jpg)
- (美)格兰马(Grama,A.)著 著
- 出版社: 北京:机械工业出版社
- ISBN:7111125126
- 出版时间:2003
- 标注页数:636页
- 文件大小:33MB
- 文件页数:657页
- 主题词:并行算法-高等学校-教材-英文
PDF下载
下载说明
并行计算导论PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 1 Introduction to Parallel Computing1
1.1 Motivating Parallelism2
1.1.1 The Computational Power Argument-from Transistors to FLOPS2
1.1.2 The Memory/Disk Speed Argument3
1.1.3 The Data Communication Argument4
1.2 Scope of Parallel Computing4
1.2.1 Applications in Engineering and Design4
1.2.2 Scientific Applications5
1.2.3 Commercial Applications5
1.2.4 Applications in Computer Systems6
1.3 Organization and Contents of the Text6
1.4 Bibliographic Remarks8
Problems9
CHAPTER 2 Parallel Progrmming Platforms11
2.1.1 Pipelining and Superscalar Execution12
2.1 Implicit parallelism:Trends in Microprocessor Architectures12
2.1.2 Very Long Instruction Word Processors15
2.2 Limitations of Memory System Performance16
2.2.1 Improving Effective Memory Latency Using Caches17
2.2.2 Impact of Memory Bandwidth18
2.2.3 Alternate Approaches for Hiding Memory Latency21
2.2.4 Tradeoffs of Multithreading and Prefetching23
2.3 Dichotomy of Parallel Computing Platforms24
2.3.1 Control Structure of parallel Platforms25
2.3.2 Communication Model of Parallel Platforms27
2.4 Physical Organization of Parallel Platforms31
2.4.1 Architecture of an Ideal Parallel Computer31
2.4.2 Interconnection Networks for Parallel Computers32
2.4.3 Network Topologies33
2.4.4 Evaluating Static Interconnection Networks43
2.4.5 Evaluating Dynamic Interconnection Networks44
2.4.6 Cache Coherence in Multiprocessor Systems45
2.5.1 Message Passing Costs in Parallel Computers53
2.5 Communication Costs in Parallel Machines53
2.5.2 Communication Costs in Shared-Address-Space Machines61
2.6 Routing Mechanisms for Interconnection Networks63
2.7 Impact of Process-Processor Mapping and Mapping Techniques65
2.7.1 Mapping Techniques for Graphs66
2.7.2 Cost-Performance Tradeoffs73
2.8 Bibliographic Remarks74
Problems76
CHAPTER 3 Principles of Parallel Algorithm Design85
3.1 Preliminaries86
3.1.1 Decomposition,Tasks,and Dependency Graphs86
3.1.2 Granularity,Concurrency,and Task-Interaction89
3.1.3 Processes and Mapping93
3.1.4 Processes versus Processors94
3.2 Decomposition Techniques95
3.2.1 Recursive Decomposition95
3.2.2 Data Decomposition97
3.2.3 Exploratory Decomposition105
3.2.4 Speculative Decomposition107
3.2.5 Hybrid Decompositions109
3.3 Characteristics of Tasks and Interactions110
3.3.1 Characteristics of Tasks110
3.3.2 Characteristics of Inter-Task Interactions112
3.4 Mapping Techniques for Load Balancing115
3.4.1 Schemes for Static Mapping117
3.4.2 Schemes for Dynamic Mapping130
3.5.1 Maximizing Data Locality132
3.5 Methods for Containing Interaction Overheads132
3.5.2 Minimizing Contention and Hot Spots134
3.5.3 Overlapping Computations with Interactions135
3.5.4 Replicating Data or Computations136
3.5.5 Using Optimized Collective Interaction Operations137
3.5.6 Overlapping Interactions with Other Interactions138
3.6 Parallel Algorithm Models139
3.6.1 The Data-Parallel Model139
3.6.3 Tht Work Pool Model140
3.6.2 The Task Graph Model140
3.6.4 The Master-Slave Model141
3.6.5 The Pipeline or Producer-Consumer Model141
3.6.6 Hybrid Models142
3.7 Bibliographic Remarks142
Problems143
CHAPTER 4 Basic Communication Operations147
4.1 One-to-All Broadcast and All-to-One Reduction149
4.1.1 Ring or Linear Array149
4.1.2 Mesh152
4.1.3 Hypercube153
4.1.4 Balanced Binary Tree153
4.1.5 Detailed Algorithms154
4.1.6 Cost Analysis156
4.2 All-to-All Broadcast and Reduction157
4.2.1 Linear Array and Ring158
4.2.2 Mesh160
4.2.3 Hypercube161
4.2.4 Cost Analysis164
4.3 All-Reduce and Prefix-Sum Operations166
4.4 Scatter and Gather167
4.5 All-to-All Personalized Communication170
4.5.1 Ring173
4.5.2 Mesh174
4.5.3 Hypercube175
4.6.1 Mesh179
4.6 Circular Shift179
4.6.2 Hypercube181
4.7 Improving the Speed of Some Communication Operations184
4.7.1 Splitting and Routing Messages in Parts184
4.7.2 All-Port Communication186
4.8 Summary187
4.9 Bibliographic Remarks188
Problems190
5.1 Sources of Overhead in Parallel Programs195
CHAPTER 5 Analytical Modeling of Parallel Programs195
5.2 Performance Metrics for Parallel Systems197
5.2.1 Execution Time197
5.2.2 Total Parallel Overhead197
5.2.3 Speedup198
5.2.4 Efficiency202
5.2.5 Cost203
5.3 The Effect of Granularity on Performance205
5.4 Scalability of Parallel Systems208
5.4.1 Scaling Characteristics of Parallel Programs209
5.4.2 The Isoefficiency Metric of Scalability212
5.4.3 Cost-Optimality and the Isoefficiency Function217
5.4.4 A Lower Bound on the Isoefficiency Function217
5.4.5 The Degree of Concurrency and the Isoefficiency Function218
5.5 Minimum Execution Time and Minimum Cost-Optimal Execution Time218
5.6 Asymptotic Analysis of Parallel Programs221
5.7 Other Scalability Metrics222
5.8 Bibliographic Remarks226
Problems228
CHAPTER 6 Programming Using the Message-Passing Paradigm233
6.1 Principles of Message-Passing Programming233
6.2 The Building Blocks:Send and Receive Operations235
6.2.1 Blocking Message Passing Operations236
6.2.2 Non-Blocking Message Passing Operations239
6.3 MPI:the Message Passing Interface240
6.3.2 Communicators242
6.3.1 Starting and Terminating the MPI Library242
6.2.3 Getting Information243
6.3.4 Sending and Receiving Messages244
6.3.5 Example:Odd-Even Sort248
6.4 Topologies and Embedding250
6.4.1 Creating and Using Cartesian Topologies251
6.4.2 Example:Cannon s Matrix-Matrix Multiplication253
6.5 Overlapping Communication with Computation255
6.5.1 Non-Blocking Communication Operations255
6.6.2 Broadcast260
6.6.1 Barrier260
6.6 Collective Communication and Computation Operations260
6.6.3 Reduction261
6.6.4 Prefix263
6.6.5 Gather263
6.6.6 Scatter264
6.6.7 All-to-All265
6.6.8 Example:One-Dimensional Matrix-Vector Multiplication266
6.6.9 Example:Single-Source Shortest-Path268
6.6.10 Example:Sample Sort270
6.7 Groups and Communicators272
6.7.1 Example:Two-Dimensional Matrix-Vector Multiplication274
6.8 Bibliographic Remarks276
Problems277
CHAPTER 7 Programming Shared Address Space Platforms279
7.1 Thread Basics280
7.2 Why Threads?281
7.4 Thread Basics:Creation and Termination282
7.3 The POSIX Thread API282
7.5 Synchronization Primitives in Pthreads287
7.5.1 Mutual Exclusion for Shared Variables287
7.5.2 Condition Variables for Synchronization294
7.6 Controlling Thread and Synchronization Attributes298
7.6.1 Attributes Objects for Threads299
7.6.2 Attributes Objects for Mutexes300
7.7 Thread Cancellation301
7.8.1 Read-Write Locks302
7.8 Composite Synchronization Constructs302
7.8.2 Barriers307
7.9 Tips for Designing Asynchronous Programs310
7.10 OpenMP:a Standard for Directive Based Parallel Programming311
7.10.1 The OpenMP Programming Model312
7.10.2 Specifying Concurrent Tasks in OpenMP315
7.10.3 Synchronization Constructs in OpenMP322
7.10.4 Data Handling in OpenMP327
7.10.5 OpenMP Library Functions328
7.10.6 Environment Variables in OpenMP330
7.10.7 Explicit Threads versus OpenMP Based Programming331
7.11 Bibliographic Remarks332
Problems332
CHAPTER 8 Dense Matrix Algorithms337
8.1 Matrix-Vector Multiplication337
8.1.1 Rowwise 1-D Partitioning338
8.1.2 2-D Partitioning341
8.2 Matrix-Matrix Multiplication345
8.2.1 A Simple Parallel Algorithm346
8.2.2 Cannon s Algorithm347
8.2.3 The DNS Algorithm349
8.3 Solving a System of Linear Equations352
8.3.1 A Simple Gaussian Elimination Algorithm353
8.3.2 Gaussian Elimination with Partial Pivoting366
8.3.3 Solving a Triangular System:Back-Substitution369
8.3.4 Numerical Considerations in Solving Systems of Linear Equations370
8.4 Bibliographic Remarks371
Problems372
CHAPTER 9 Sorting379
9.1 Issues in Sorting on Parallel Computers380
9.1.1 Where the Input and Output Sequences are Stored380
9.1.2 How Comparisons are Performed380
9.2 Sorting Networks382
9.2.1 Bitonic Sort384
9.2.2 Mapping Bitonic Sort to a Hypercube and a Mesh387
9.3 Bubble Sort and its Variants394
9.3.1 Odd-Even Transposition395
9.3.2 Shellsort398
9.4 Quicksort399
9.4.1 Parallelizing Quicksort401
9.4.2 Parallel Formulation for a CRCW PRAM402
9.4.3 Parallel Formulation for Practical Architectures404
9.4.4 Pivot Selection411
9.5 Bucket and Sample Sort412
9.6.1 Enumeration Sort414
9.6 Other Sorting Algorithms414
9.6.2 Radix Sort415
9.7 Bibliographic Remarks416
Problems419
CHAPTER 10 Graph Algorithms429
10.1 Definitions and Representation429
10.2 Minimum Spanning Tree:Prim s Algorithm432
10.3 Single-Source Shortest Paths:Dijkstra s Algorithm436
10.4 All-Pairs Shortest Paths437
10.4.1 Dijkstra s Algorithm438
10.4.2 Floyd s Algorithm440
10.4.3 Performance Comparisons445
10.5 Transitive Closure445
10.6 Connected Components446
10.6.1 A Depth-First Search Based Algorithm446
10.7 Algorithms for Sparse Graphs450
10.7.1 Finding a Maximal Independent Set451
10.7.2 Single-Source Shortest Paths455
10.8 Bibliographic Remarks462
Problems465
CHAPTER 11 Search Algorithms for Discrete Optimization Problems469
11.1 Definitions and Examples469
11.2 Sequential Search Algorithms474
11.2.1 Depth-First Search Algorithms474
11.2.2 Best-First Search Algorithms478
11.3 Search Overhead Factor478
11.4 Parallel Depth-First Search480
11.4.1 Important Parameters of Parallel DFS482
11.4.2 A General Framework for Analysis of Parallel DFS485
11.4.3 Analvsis of Load-Balancing Schemes488
11.4.4 Termination Detection490
11.4.5 Experimental Results492
11.4.6 Parallel Formulations of Depth-First Branch-and-Bound Search495
11.4.7 Parallel Formulations of IDA496
11.5 Parallel Best-First Search496
11.6 Speedup Anomalies in Parallel Search Algorithms501
11.6.1 Analysis of Average Speedup in Parallel DFS502
11.7 Bibliographic Remarks505
Problems510
CHAPTER 12 Dynamic Programming515
12.1 Overview of Dynamic Programming515
12.2 Serial Monadic DP Formulations518
12.2.1 The Shortest-Path Problem518
12.2.2 The O/I Knapsack Problem520
12.3 Nonserial Monadic DP Formulations523
12.3.1 The Longest-Common-Subsequence Problem523
12.4 Serial Polyadic DP Formulations526
12.4 1 Floyd s All-Pairs Shortest-Paths Algorithm526
12.5 Nonserial Polyadic DP Formulations527
12.5.1 The Optimal Matrix-Parenthesization Problem527
12.6 Summary and Discussion530
12.7 Bibliographic Remarks531
Problems532
CHAPTER 13 Fast Fourier Transform537
13.1 The Serial Algorithm538
13.2 The Binary-Exchange Algorithm541
13.2.1 A Full Bandwidth Network541
13.2.2 Limited Bandwidth Network548
13.2.3 Extra Computations in Parallel FFT551
13.3 The Transpose Algorithm553
13.3.1 Two-Dimensional Transpose Algorithm553
13.3.2 The Generalized Transpose Algorithm556
13.4 Bibliographic Remarks560
Problems562
APPENDIX A Complexity of Functions and Order Analytsis565
A.1 Complexity of Functions565
A.2 Order Analysis of Functions566
Bibliography569
Author Index611
Subject Index621