图书介绍
Hadoop权威指南 英文PDF|Epub|txt|kindle电子书版本网盘下载
- Tom White著 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564138936
- 出版时间:2013
- 标注页数:662页
- 文件大小:63MB
- 文件页数:686页
- 主题词:数据处理-应用软件-英文
PDF下载
下载说明
Hadoop权威指南 英文PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
1.Meet Hadoop1
Data!1
Data Storage and Analysis3
Comparison with Other Systems4
Rational Database Management System4
Grid Computing6
Volunteer Computing8
A Brief History of Hadoop9
Apache Hadoop and the Hadoop Ecosystem12
Hadoop Releases13
What's Covered in This Book15
Compatibility15
2.MapReduce17
A Weather Dataset17
Data Format17
Analyzing the Data with Unix Tools19
Analyzing the Data with Hadoop20
Map and Reduce20
Java MapReduce22
Scaling Out30
Data Flow30
Combiner Functions33
Running a Distributed MapReduce Job36
Hadoop Streaming36
Ruby36
Python39
Hadoop Pipes40
Compiling and Running41
3.The Hadoop Distributed Filesystem43
The Design of HDFS43
HDFS Concepts45
Blocks45
Namenodes and Datanodes46
HDFS Federation47
HDFS High-Availability48
The Command-Line Interface49
Basic Filesystem Operations50
Hadoop Filesystems52
Interfaces53
The Java Interface55
Reading Data from a Hadoop URL55
Reading Data Using the FileSystem API57
Writing Data60
Directories62
Querying the Filesystem62
Deleting Data67
Data Flow67
Anatomy of a File Read67
Anatomy of a File Write70
Coherency Model72
Data Ingest with Flume and Sqoop74
Parallel Copying with distcp75
Keeping an HDFS Cluster Balanced76
Hadoop Archives77
Using Hadoop Archives77
Limitations79
4.Hadoop I/O81
Data Integrity81
Data Integrity in HDFS81
LocalFileSystem82
ChecksumFileSystem83
Compression83
Codecs85
Compression and Input Splits89
Using Compression in MapReduce90
Serialization93
The Writable Interface94
Writable Classes96
Implementing a Custom Writable103
Serialization Frameworks108
Avro110
Avro Data Types and Schemas111
In-Memory Serialization and Deserialization114
Avro Datafiles117
Interoperability118
Schema Resolution121
Sort Order123
Avro MapReduce124
Sorting Using Avro MapReduce128
Avro MapReduce in Other Languages130
File-Based Data Structures130
SequenceFile130
MapFile137
5.Developing a MapReduce Application143
The Configuration API144
Combining Resources145
Variable Expansion146
Setting Up the Development Environment146
Managing Configuration148
GenericOptionsParser,Tool,and ToolRunner150
Writing a Unit Test with MRUnit154
Mapper154
Reducer156
Running Locally on Test Data157
Running a Job in a Local Job Runner157
Testing the Driver160
Running on a Cluster161
Packaging a Job162
Launching a Job163
The MapReduce Web UI165
Retrieving the Results168
Debugging a Job170
Hadoop Logs175
Remote Debugging177
Tuning a Job178
Profiling Tasks179
MapReduce Workflows181
Decomposing a Problem into MapReduce Jobs181
JobControl183
Apache Oozie183
6.How MapReduce Works189
Anatomy of a MapReduce Job Run189
Classic MapReduce(MapReduce 1)190
YARN(MapReduce 2)196
Failures202
Failures in Classic MapReduce202
Failures in YARN204
Job Scheduling206
The Fair Scheduler207
The Capacity Scheduler207
Shuffle and Sort208
The Map Side208
The Reduce Side210
Configuration Tuning211
Task Execution214
The Task Execution Environment215
Speculative Execution215
Output Committers217
Task JVM Reuse219
Skipping Bad Records220
7.MapReduce Types and Formats223
MapReduce Types223
The Default MapReduce Job227
Input Formats234
Input Splits and Records234
Text Input245
Binary Input249
Multiple Inputs250
Database Input(and Output)251
Output Formats251
Text Output252
Binary Output253
Multiple Outputs253
Lazy Output257
Database Output258
8.MapReduce Features259
Counters259
Built-in Counters259
User-Defined Java Counters264
User-Defined Streaming Counters268
Sorting268
Preparation269
Partial Sort270
Total Sort274
Secondary Sort277
Joins283
Map-Side Joins284
Reduce-Side Joins285
Side Data Distribution288
Using the Job Configuration288
Distributed Cache289
MapReduce Library Classes295
9.Setting Up a Hadoop Cluster297
Cluster Specification297
Network Topology299
Cluster Setup and Installation301
Installing Java302
Creating a Hadoop User302
Installing Hadoop302
Testing the Installation303
SSH Configuration303
Hadoop Configuration304
Configuration Management305
Environment Settings307
Important Hadoop Daemon Properties311
Hadoop Daemon Addresses and Ports316
Other Hadoop Properties317
User Account Creation320
YARN Configuration320
Important YARN Daemon Properties321
YARN Daemon Addresses and Ports324
Security325
Kerberos and Hadoop326
Delegation Tokens328
Other Security Enhancements329
Benchmarking a Hadoop Cluster331
Hadoop Benchmarks331
User Jobs333
Hadoop in the Cloud334
Apache Whirr334
10.Administering Hadoop339
HDFS339
Persistent Data Structures339
Safe Mode344
Audit Logging346
Tools347
Monitoring351
Logging352
Metrics352
Java Management Extensions355
Maintenance358
Routine Administration Procedures358
Commissioning and Decommissioning Nodes359
Upgrades362
11.Pig367
Installing and Running Pig368
Execution Types368
Running Pig Programs370
Grunt370
Pig Latin Editors ;371
An Example371
Generating Examples373
Comparison with Databases374
Pig Latin375
Structure376
Statements377
Expressions381
Types382
Schemas384
Functions388
Macros390
User-Defined Functions391
A Filter UDF391
An Eval UDF394
A Load UDF396
Data Processing Operators399
Loading and Storing Data399
Filtering Data400
Grouping and Joining Data402
Sorting Data407
Combining and Splitting Data408
Pig in Practice409
Parallelism409
Parameter Substitution410
12.Hive413
Installing Hive414
The Hive Shell415
An Example416
Running Hive417
Configuring Hive417
Hive Services419
The Metastore421
Comparison with Traditional Database423
Schema on Read Versus Schema on Write423
Updates,Transactions,and Indexes424
HiveQL425
Data Types426
Operators and Functions428
Tables429
Managed Tables and External Tables429
Partitions and Buckets431
Storage Formats435
Importing Data441
Altering Tables443
Dropping Tables443
Querying Data444
Sorting and Aggregating444
MapReduce Scripts445
Joins446
Subqueries449
Views450
User-Defined Functions451
Writing a UDF452
Writing a UDAF454
13.HBase459
HBasics459
Backdrop460
Concepts460
Whirlwind Tour of the Data Model460
Implementation461
Installation464
Test Drive465
Clients467
Java467
Avro,REST,and Thrift470
Example472
Schemas472
Loading Data473
Web Queries476
HBase Versus RDBMS479
Successful Service480
HBase481
Use Case:HBase at Streamy.com481
Praxis483
Versions483
HDFS484
UI485
Metrics485
Schema Design486
Counters486
Bulk Load487
14.ZooKeeper489
Installing and Running ZooKeeper490
An Example492
Group Membership in ZooKeeper492
Creating the Group493
Joining a Group495
Listing Members in a Group496
Deleting a Group498
The ZooKeeper Service499
Data Model499
Operations501
Implementation506
Consistency507
Sessions509
States511
Building Applications with ZooKeeper512
A Configuration Service512
The Resilient ZooKeeper Application515
A Lock Service519
More Distributed Data Structures and Protocols521
ZooKeeper in Production522
Resilience and Performance523
Configuration524
15.Sqoop527
Getting Sqoop527
Sqoop Connectors529
A Sample Import529
Text and Binary File Formats532
Generated Code532
Additional Serialization Systems533
Imports:A Deeper Look533
Controlling the Import535
Imports and Consistency536
Direct-mode Imports536
Working with Imported Data536
Imported Data and Hive537
Importing Large Objects540
Performing an Export542
Exports:A Deeper Look543
Exports and Transactionality545
Exports and SequenceFiles545
16.Case Studies547
Hadoop Usage at Last.fm547
Last.fm:The Social Music Revolution547
Hadoop at Last.fm547
Generating Charts with Hadoop548
The Track Statistics Program549
Summary556
Hadoop and Hive at Facebook556
Hadoop at Facebook556
Hypothetical Use Case Studies559
Hive562
Problems and Future Work566
Nutch Search Engine567
Data Structures568
Selected Examples of Hadoop Data Processing in Nutch571
Summary580
Log Processing at Rackspace581
Requirements/The Problem581
Brief History582
Choosing Hadoop582
Collection and Storage582
MapReduce for Logs583
Cascading589
Fields,Tuples,and Pipes590
Operations593
Taps,Schemes,and Flows594
Cascading in Practice595
Flexibility598
Hadoop and Cascading at ShareThis599
Summary603
TeraByte Sort on Apache Hadoop603
Using Pig and Wukong to Explore Billion-edge Network Graphs607
Measuring Community609
Everybody's Talkin'at Me:The Twitter Reply Graph609
Symmetric Links612
Community Extraction613
A.Installing Apache Hadoop617
B.Cloudera's Distribution Including Apache Hadoop623
C.Preparing the NCDC Weather Data625
Index629