图书介绍

Hadoop权威指南 英文PDF|Epub|txt|kindle电子书版本网盘下载

Hadoop权威指南 英文
  • Tom White著 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564138936
  • 出版时间:2013
  • 标注页数:662页
  • 文件大小:63MB
  • 文件页数:686页
  • 主题词:数据处理-应用软件-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

Hadoop权威指南 英文PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1.Meet Hadoop1

Data!1

Data Storage and Analysis3

Comparison with Other Systems4

Rational Database Management System4

Grid Computing6

Volunteer Computing8

A Brief History of Hadoop9

Apache Hadoop and the Hadoop Ecosystem12

Hadoop Releases13

What's Covered in This Book15

Compatibility15

2.MapReduce17

A Weather Dataset17

Data Format17

Analyzing the Data with Unix Tools19

Analyzing the Data with Hadoop20

Map and Reduce20

Java MapReduce22

Scaling Out30

Data Flow30

Combiner Functions33

Running a Distributed MapReduce Job36

Hadoop Streaming36

Ruby36

Python39

Hadoop Pipes40

Compiling and Running41

3.The Hadoop Distributed Filesystem43

The Design of HDFS43

HDFS Concepts45

Blocks45

Namenodes and Datanodes46

HDFS Federation47

HDFS High-Availability48

The Command-Line Interface49

Basic Filesystem Operations50

Hadoop Filesystems52

Interfaces53

The Java Interface55

Reading Data from a Hadoop URL55

Reading Data Using the FileSystem API57

Writing Data60

Directories62

Querying the Filesystem62

Deleting Data67

Data Flow67

Anatomy of a File Read67

Anatomy of a File Write70

Coherency Model72

Data Ingest with Flume and Sqoop74

Parallel Copying with distcp75

Keeping an HDFS Cluster Balanced76

Hadoop Archives77

Using Hadoop Archives77

Limitations79

4.Hadoop I/O81

Data Integrity81

Data Integrity in HDFS81

LocalFileSystem82

ChecksumFileSystem83

Compression83

Codecs85

Compression and Input Splits89

Using Compression in MapReduce90

Serialization93

The Writable Interface94

Writable Classes96

Implementing a Custom Writable103

Serialization Frameworks108

Avro110

Avro Data Types and Schemas111

In-Memory Serialization and Deserialization114

Avro Datafiles117

Interoperability118

Schema Resolution121

Sort Order123

Avro MapReduce124

Sorting Using Avro MapReduce128

Avro MapReduce in Other Languages130

File-Based Data Structures130

SequenceFile130

MapFile137

5.Developing a MapReduce Application143

The Configuration API144

Combining Resources145

Variable Expansion146

Setting Up the Development Environment146

Managing Configuration148

GenericOptionsParser,Tool,and ToolRunner150

Writing a Unit Test with MRUnit154

Mapper154

Reducer156

Running Locally on Test Data157

Running a Job in a Local Job Runner157

Testing the Driver160

Running on a Cluster161

Packaging a Job162

Launching a Job163

The MapReduce Web UI165

Retrieving the Results168

Debugging a Job170

Hadoop Logs175

Remote Debugging177

Tuning a Job178

Profiling Tasks179

MapReduce Workflows181

Decomposing a Problem into MapReduce Jobs181

JobControl183

Apache Oozie183

6.How MapReduce Works189

Anatomy of a MapReduce Job Run189

Classic MapReduce(MapReduce 1)190

YARN(MapReduce 2)196

Failures202

Failures in Classic MapReduce202

Failures in YARN204

Job Scheduling206

The Fair Scheduler207

The Capacity Scheduler207

Shuffle and Sort208

The Map Side208

The Reduce Side210

Configuration Tuning211

Task Execution214

The Task Execution Environment215

Speculative Execution215

Output Committers217

Task JVM Reuse219

Skipping Bad Records220

7.MapReduce Types and Formats223

MapReduce Types223

The Default MapReduce Job227

Input Formats234

Input Splits and Records234

Text Input245

Binary Input249

Multiple Inputs250

Database Input(and Output)251

Output Formats251

Text Output252

Binary Output253

Multiple Outputs253

Lazy Output257

Database Output258

8.MapReduce Features259

Counters259

Built-in Counters259

User-Defined Java Counters264

User-Defined Streaming Counters268

Sorting268

Preparation269

Partial Sort270

Total Sort274

Secondary Sort277

Joins283

Map-Side Joins284

Reduce-Side Joins285

Side Data Distribution288

Using the Job Configuration288

Distributed Cache289

MapReduce Library Classes295

9.Setting Up a Hadoop Cluster297

Cluster Specification297

Network Topology299

Cluster Setup and Installation301

Installing Java302

Creating a Hadoop User302

Installing Hadoop302

Testing the Installation303

SSH Configuration303

Hadoop Configuration304

Configuration Management305

Environment Settings307

Important Hadoop Daemon Properties311

Hadoop Daemon Addresses and Ports316

Other Hadoop Properties317

User Account Creation320

YARN Configuration320

Important YARN Daemon Properties321

YARN Daemon Addresses and Ports324

Security325

Kerberos and Hadoop326

Delegation Tokens328

Other Security Enhancements329

Benchmarking a Hadoop Cluster331

Hadoop Benchmarks331

User Jobs333

Hadoop in the Cloud334

Apache Whirr334

10.Administering Hadoop339

HDFS339

Persistent Data Structures339

Safe Mode344

Audit Logging346

Tools347

Monitoring351

Logging352

Metrics352

Java Management Extensions355

Maintenance358

Routine Administration Procedures358

Commissioning and Decommissioning Nodes359

Upgrades362

11.Pig367

Installing and Running Pig368

Execution Types368

Running Pig Programs370

Grunt370

Pig Latin Editors ;371

An Example371

Generating Examples373

Comparison with Databases374

Pig Latin375

Structure376

Statements377

Expressions381

Types382

Schemas384

Functions388

Macros390

User-Defined Functions391

A Filter UDF391

An Eval UDF394

A Load UDF396

Data Processing Operators399

Loading and Storing Data399

Filtering Data400

Grouping and Joining Data402

Sorting Data407

Combining and Splitting Data408

Pig in Practice409

Parallelism409

Parameter Substitution410

12.Hive413

Installing Hive414

The Hive Shell415

An Example416

Running Hive417

Configuring Hive417

Hive Services419

The Metastore421

Comparison with Traditional Database423

Schema on Read Versus Schema on Write423

Updates,Transactions,and Indexes424

HiveQL425

Data Types426

Operators and Functions428

Tables429

Managed Tables and External Tables429

Partitions and Buckets431

Storage Formats435

Importing Data441

Altering Tables443

Dropping Tables443

Querying Data444

Sorting and Aggregating444

MapReduce Scripts445

Joins446

Subqueries449

Views450

User-Defined Functions451

Writing a UDF452

Writing a UDAF454

13.HBase459

HBasics459

Backdrop460

Concepts460

Whirlwind Tour of the Data Model460

Implementation461

Installation464

Test Drive465

Clients467

Java467

Avro,REST,and Thrift470

Example472

Schemas472

Loading Data473

Web Queries476

HBase Versus RDBMS479

Successful Service480

HBase481

Use Case:HBase at Streamy.com481

Praxis483

Versions483

HDFS484

UI485

Metrics485

Schema Design486

Counters486

Bulk Load487

14.ZooKeeper489

Installing and Running ZooKeeper490

An Example492

Group Membership in ZooKeeper492

Creating the Group493

Joining a Group495

Listing Members in a Group496

Deleting a Group498

The ZooKeeper Service499

Data Model499

Operations501

Implementation506

Consistency507

Sessions509

States511

Building Applications with ZooKeeper512

A Configuration Service512

The Resilient ZooKeeper Application515

A Lock Service519

More Distributed Data Structures and Protocols521

ZooKeeper in Production522

Resilience and Performance523

Configuration524

15.Sqoop527

Getting Sqoop527

Sqoop Connectors529

A Sample Import529

Text and Binary File Formats532

Generated Code532

Additional Serialization Systems533

Imports:A Deeper Look533

Controlling the Import535

Imports and Consistency536

Direct-mode Imports536

Working with Imported Data536

Imported Data and Hive537

Importing Large Objects540

Performing an Export542

Exports:A Deeper Look543

Exports and Transactionality545

Exports and SequenceFiles545

16.Case Studies547

Hadoop Usage at Last.fm547

Last.fm:The Social Music Revolution547

Hadoop at Last.fm547

Generating Charts with Hadoop548

The Track Statistics Program549

Summary556

Hadoop and Hive at Facebook556

Hadoop at Facebook556

Hypothetical Use Case Studies559

Hive562

Problems and Future Work566

Nutch Search Engine567

Data Structures568

Selected Examples of Hadoop Data Processing in Nutch571

Summary580

Log Processing at Rackspace581

Requirements/The Problem581

Brief History582

Choosing Hadoop582

Collection and Storage582

MapReduce for Logs583

Cascading589

Fields,Tuples,and Pipes590

Operations593

Taps,Schemes,and Flows594

Cascading in Practice595

Flexibility598

Hadoop and Cascading at ShareThis599

Summary603

TeraByte Sort on Apache Hadoop603

Using Pig and Wukong to Explore Billion-edge Network Graphs607

Measuring Community609

Everybody's Talkin'at Me:The Twitter Reply Graph609

Symmetric Links612

Community Extraction613

A.Installing Apache Hadoop617

B.Cloudera's Distribution Including Apache Hadoop623

C.Preparing the NCDC Weather Data625

Index629

热门推荐