图书介绍

并行计算导论PDF|Epub|txt|kindle电子书版本网盘下载

并行计算导论
  • (美)格兰马(Grama,A.)著 著
  • 出版社: 北京:机械工业出版社
  • ISBN:7111125126
  • 出版时间:2003
  • 标注页数:636页
  • 文件大小:33MB
  • 文件页数:657页
  • 主题词:并行算法-高等学校-教材-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

并行计算导论PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

CHAPTER 1 Introduction to Parallel Computing1

1.1 Motivating Parallelism2

1.1.1 The Computational Power Argument-from Transistors to FLOPS2

1.1.2 The Memory/Disk Speed Argument3

1.1.3 The Data Communication Argument4

1.2 Scope of Parallel Computing4

1.2.1 Applications in Engineering and Design4

1.2.2 Scientific Applications5

1.2.3 Commercial Applications5

1.2.4 Applications in Computer Systems6

1.3 Organization and Contents of the Text6

1.4 Bibliographic Remarks8

Problems9

CHAPTER 2 Parallel Progrmming Platforms11

2.1.1 Pipelining and Superscalar Execution12

2.1 Implicit parallelism:Trends in Microprocessor Architectures12

2.1.2 Very Long Instruction Word Processors15

2.2 Limitations of Memory System Performance16

2.2.1 Improving Effective Memory Latency Using Caches17

2.2.2 Impact of Memory Bandwidth18

2.2.3 Alternate Approaches for Hiding Memory Latency21

2.2.4 Tradeoffs of Multithreading and Prefetching23

2.3 Dichotomy of Parallel Computing Platforms24

2.3.1 Control Structure of parallel Platforms25

2.3.2 Communication Model of Parallel Platforms27

2.4 Physical Organization of Parallel Platforms31

2.4.1 Architecture of an Ideal Parallel Computer31

2.4.2 Interconnection Networks for Parallel Computers32

2.4.3 Network Topologies33

2.4.4 Evaluating Static Interconnection Networks43

2.4.5 Evaluating Dynamic Interconnection Networks44

2.4.6 Cache Coherence in Multiprocessor Systems45

2.5.1 Message Passing Costs in Parallel Computers53

2.5 Communication Costs in Parallel Machines53

2.5.2 Communication Costs in Shared-Address-Space Machines61

2.6 Routing Mechanisms for Interconnection Networks63

2.7 Impact of Process-Processor Mapping and Mapping Techniques65

2.7.1 Mapping Techniques for Graphs66

2.7.2 Cost-Performance Tradeoffs73

2.8 Bibliographic Remarks74

Problems76

CHAPTER 3 Principles of Parallel Algorithm Design85

3.1 Preliminaries86

3.1.1 Decomposition,Tasks,and Dependency Graphs86

3.1.2 Granularity,Concurrency,and Task-Interaction89

3.1.3 Processes and Mapping93

3.1.4 Processes versus Processors94

3.2 Decomposition Techniques95

3.2.1 Recursive Decomposition95

3.2.2 Data Decomposition97

3.2.3 Exploratory Decomposition105

3.2.4 Speculative Decomposition107

3.2.5 Hybrid Decompositions109

3.3 Characteristics of Tasks and Interactions110

3.3.1 Characteristics of Tasks110

3.3.2 Characteristics of Inter-Task Interactions112

3.4 Mapping Techniques for Load Balancing115

3.4.1 Schemes for Static Mapping117

3.4.2 Schemes for Dynamic Mapping130

3.5.1 Maximizing Data Locality132

3.5 Methods for Containing Interaction Overheads132

3.5.2 Minimizing Contention and Hot Spots134

3.5.3 Overlapping Computations with Interactions135

3.5.4 Replicating Data or Computations136

3.5.5 Using Optimized Collective Interaction Operations137

3.5.6 Overlapping Interactions with Other Interactions138

3.6 Parallel Algorithm Models139

3.6.1 The Data-Parallel Model139

3.6.3 Tht Work Pool Model140

3.6.2 The Task Graph Model140

3.6.4 The Master-Slave Model141

3.6.5 The Pipeline or Producer-Consumer Model141

3.6.6 Hybrid Models142

3.7 Bibliographic Remarks142

Problems143

CHAPTER 4 Basic Communication Operations147

4.1 One-to-All Broadcast and All-to-One Reduction149

4.1.1 Ring or Linear Array149

4.1.2 Mesh152

4.1.3 Hypercube153

4.1.4 Balanced Binary Tree153

4.1.5 Detailed Algorithms154

4.1.6 Cost Analysis156

4.2 All-to-All Broadcast and Reduction157

4.2.1 Linear Array and Ring158

4.2.2 Mesh160

4.2.3 Hypercube161

4.2.4 Cost Analysis164

4.3 All-Reduce and Prefix-Sum Operations166

4.4 Scatter and Gather167

4.5 All-to-All Personalized Communication170

4.5.1 Ring173

4.5.2 Mesh174

4.5.3 Hypercube175

4.6.1 Mesh179

4.6 Circular Shift179

4.6.2 Hypercube181

4.7 Improving the Speed of Some Communication Operations184

4.7.1 Splitting and Routing Messages in Parts184

4.7.2 All-Port Communication186

4.8 Summary187

4.9 Bibliographic Remarks188

Problems190

5.1 Sources of Overhead in Parallel Programs195

CHAPTER 5 Analytical Modeling of Parallel Programs195

5.2 Performance Metrics for Parallel Systems197

5.2.1 Execution Time197

5.2.2 Total Parallel Overhead197

5.2.3 Speedup198

5.2.4 Efficiency202

5.2.5 Cost203

5.3 The Effect of Granularity on Performance205

5.4 Scalability of Parallel Systems208

5.4.1 Scaling Characteristics of Parallel Programs209

5.4.2 The Isoefficiency Metric of Scalability212

5.4.3 Cost-Optimality and the Isoefficiency Function217

5.4.4 A Lower Bound on the Isoefficiency Function217

5.4.5 The Degree of Concurrency and the Isoefficiency Function218

5.5 Minimum Execution Time and Minimum Cost-Optimal Execution Time218

5.6 Asymptotic Analysis of Parallel Programs221

5.7 Other Scalability Metrics222

5.8 Bibliographic Remarks226

Problems228

CHAPTER 6 Programming Using the Message-Passing Paradigm233

6.1 Principles of Message-Passing Programming233

6.2 The Building Blocks:Send and Receive Operations235

6.2.1 Blocking Message Passing Operations236

6.2.2 Non-Blocking Message Passing Operations239

6.3 MPI:the Message Passing Interface240

6.3.2 Communicators242

6.3.1 Starting and Terminating the MPI Library242

6.2.3 Getting Information243

6.3.4 Sending and Receiving Messages244

6.3.5 Example:Odd-Even Sort248

6.4 Topologies and Embedding250

6.4.1 Creating and Using Cartesian Topologies251

6.4.2 Example:Cannon s Matrix-Matrix Multiplication253

6.5 Overlapping Communication with Computation255

6.5.1 Non-Blocking Communication Operations255

6.6.2 Broadcast260

6.6.1 Barrier260

6.6 Collective Communication and Computation Operations260

6.6.3 Reduction261

6.6.4 Prefix263

6.6.5 Gather263

6.6.6 Scatter264

6.6.7 All-to-All265

6.6.8 Example:One-Dimensional Matrix-Vector Multiplication266

6.6.9 Example:Single-Source Shortest-Path268

6.6.10 Example:Sample Sort270

6.7 Groups and Communicators272

6.7.1 Example:Two-Dimensional Matrix-Vector Multiplication274

6.8 Bibliographic Remarks276

Problems277

CHAPTER 7 Programming Shared Address Space Platforms279

7.1 Thread Basics280

7.2 Why Threads?281

7.4 Thread Basics:Creation and Termination282

7.3 The POSIX Thread API282

7.5 Synchronization Primitives in Pthreads287

7.5.1 Mutual Exclusion for Shared Variables287

7.5.2 Condition Variables for Synchronization294

7.6 Controlling Thread and Synchronization Attributes298

7.6.1 Attributes Objects for Threads299

7.6.2 Attributes Objects for Mutexes300

7.7 Thread Cancellation301

7.8.1 Read-Write Locks302

7.8 Composite Synchronization Constructs302

7.8.2 Barriers307

7.9 Tips for Designing Asynchronous Programs310

7.10 OpenMP:a Standard for Directive Based Parallel Programming311

7.10.1 The OpenMP Programming Model312

7.10.2 Specifying Concurrent Tasks in OpenMP315

7.10.3 Synchronization Constructs in OpenMP322

7.10.4 Data Handling in OpenMP327

7.10.5 OpenMP Library Functions328

7.10.6 Environment Variables in OpenMP330

7.10.7 Explicit Threads versus OpenMP Based Programming331

7.11 Bibliographic Remarks332

Problems332

CHAPTER 8 Dense Matrix Algorithms337

8.1 Matrix-Vector Multiplication337

8.1.1 Rowwise 1-D Partitioning338

8.1.2 2-D Partitioning341

8.2 Matrix-Matrix Multiplication345

8.2.1 A Simple Parallel Algorithm346

8.2.2 Cannon s Algorithm347

8.2.3 The DNS Algorithm349

8.3 Solving a System of Linear Equations352

8.3.1 A Simple Gaussian Elimination Algorithm353

8.3.2 Gaussian Elimination with Partial Pivoting366

8.3.3 Solving a Triangular System:Back-Substitution369

8.3.4 Numerical Considerations in Solving Systems of Linear Equations370

8.4 Bibliographic Remarks371

Problems372

CHAPTER 9 Sorting379

9.1 Issues in Sorting on Parallel Computers380

9.1.1 Where the Input and Output Sequences are Stored380

9.1.2 How Comparisons are Performed380

9.2 Sorting Networks382

9.2.1 Bitonic Sort384

9.2.2 Mapping Bitonic Sort to a Hypercube and a Mesh387

9.3 Bubble Sort and its Variants394

9.3.1 Odd-Even Transposition395

9.3.2 Shellsort398

9.4 Quicksort399

9.4.1 Parallelizing Quicksort401

9.4.2 Parallel Formulation for a CRCW PRAM402

9.4.3 Parallel Formulation for Practical Architectures404

9.4.4 Pivot Selection411

9.5 Bucket and Sample Sort412

9.6.1 Enumeration Sort414

9.6 Other Sorting Algorithms414

9.6.2 Radix Sort415

9.7 Bibliographic Remarks416

Problems419

CHAPTER 10 Graph Algorithms429

10.1 Definitions and Representation429

10.2 Minimum Spanning Tree:Prim s Algorithm432

10.3 Single-Source Shortest Paths:Dijkstra s Algorithm436

10.4 All-Pairs Shortest Paths437

10.4.1 Dijkstra s Algorithm438

10.4.2 Floyd s Algorithm440

10.4.3 Performance Comparisons445

10.5 Transitive Closure445

10.6 Connected Components446

10.6.1 A Depth-First Search Based Algorithm446

10.7 Algorithms for Sparse Graphs450

10.7.1 Finding a Maximal Independent Set451

10.7.2 Single-Source Shortest Paths455

10.8 Bibliographic Remarks462

Problems465

CHAPTER 11 Search Algorithms for Discrete Optimization Problems469

11.1 Definitions and Examples469

11.2 Sequential Search Algorithms474

11.2.1 Depth-First Search Algorithms474

11.2.2 Best-First Search Algorithms478

11.3 Search Overhead Factor478

11.4 Parallel Depth-First Search480

11.4.1 Important Parameters of Parallel DFS482

11.4.2 A General Framework for Analysis of Parallel DFS485

11.4.3 Analvsis of Load-Balancing Schemes488

11.4.4 Termination Detection490

11.4.5 Experimental Results492

11.4.6 Parallel Formulations of Depth-First Branch-and-Bound Search495

11.4.7 Parallel Formulations of IDA496

11.5 Parallel Best-First Search496

11.6 Speedup Anomalies in Parallel Search Algorithms501

11.6.1 Analysis of Average Speedup in Parallel DFS502

11.7 Bibliographic Remarks505

Problems510

CHAPTER 12 Dynamic Programming515

12.1 Overview of Dynamic Programming515

12.2 Serial Monadic DP Formulations518

12.2.1 The Shortest-Path Problem518

12.2.2 The O/I Knapsack Problem520

12.3 Nonserial Monadic DP Formulations523

12.3.1 The Longest-Common-Subsequence Problem523

12.4 Serial Polyadic DP Formulations526

12.4 1 Floyd s All-Pairs Shortest-Paths Algorithm526

12.5 Nonserial Polyadic DP Formulations527

12.5.1 The Optimal Matrix-Parenthesization Problem527

12.6 Summary and Discussion530

12.7 Bibliographic Remarks531

Problems532

CHAPTER 13 Fast Fourier Transform537

13.1 The Serial Algorithm538

13.2 The Binary-Exchange Algorithm541

13.2.1 A Full Bandwidth Network541

13.2.2 Limited Bandwidth Network548

13.2.3 Extra Computations in Parallel FFT551

13.3 The Transpose Algorithm553

13.3.1 Two-Dimensional Transpose Algorithm553

13.3.2 The Generalized Transpose Algorithm556

13.4 Bibliographic Remarks560

Problems562

APPENDIX A Complexity of Functions and Order Analytsis565

A.1 Complexity of Functions565

A.2 Order Analysis of Functions566

Bibliography569

Author Index611

Subject Index621

热门推荐