3.1. Patch Information for Hadoop

Hadoop is based on Apache Hadoop 2.2.0 and includes the following additional Apache JIRAs for this release:

  • HADOOP-682: Hadoop namenode -format doesn't work any more if target directory doesn't exist

  • HADOOP-717: When there are few reducers, sorting should be done by mappers

  • HADOOP-722: native-hadoop deficiencies

  • HADOOP-4093: Fix a bug that AzureBlockPlacementPolicy#chooseTarget only returns one DataNode when replication factor is greater than three

  • HADOOP-6496: Fixed the HTTPServer issue that caused incorrect rendering of the web interface for HBase; HttpServer sends wrong content-type for CSS files

  • HADOOP-7096: Allow setting of end-of-record delimiter for TextInputFormat

  • HADOOP-7389: Fixed test failures caused when tests use the TestingGroups

  • HADOOP-7827: Fixed issue with JSP pages for web interfaces

  • HADOOP-7868: Hadoop native fails to compile when default linker option is -Wl,--as-needed

  • HADOOP-8223: Applied initial patch for branch-1-win

  • HADOOP-8234: Enabled user group mappings on Windows platform

  • HADOOP-8235: Added support file permissions and ownership on Windows for RawLocalFileSystem

  • HADOOP-8374:Improved support for hard link manipulation on Windows

  • HADOOP-8409: Fixed TestCommandLineJobSubmission and TestGenericOptionsParser to work for Windows

  • HADOOP-8411: Fixed TestStorageDirectorFailure, TestTaskLogsTruncater, TestWebHdfsUrl and TestSecurityUtil failures on Windows

  • HADOOP-8414: Fixed issues caused by localhost resolving to incorrect address on Windows

  • HADOOP-8420: Hadoop Common creating package-info.java must not depend on sh

  • HADOOP-8424: Fixed Classpath issues that caused intermittent failures for web user interface on Windows

  • HADOOP-8440: Fixed failures for HarFileSystem.decodeHarURI

  • HADOOP-8453: Added unit tests for Winutils program. Winutils is the Windows console program that emulates the Linux command line utilities used by Hadoop

  • HADOOP-8454: Fixed bugs for the chmod command in Winutils program

  • HADOOP-8457: Fixed the file ownership issue for users in the Administrators groups on Windows

  • HADOOP-8486: Fixed the resource leak caused because of open file handles for SequenceFile

  • HADOOP-8487: Fixed the HDFS tests to use correct test paths

  • HADOOP-8534: Fixed failures for those tests that left configuration files open

  • HADOOP-8544: Moved an assertion location in winutils chmod command

  • HADOOP-8564: Port and extend Hadoop native libraries for Windows to address DataNode concurrent reading and writing issue

  • HADOOP-8617: Backport HADOOP-6148, HADOOP-6166 and HADOOP-7333 for a pure Java CRC32 calculator implementation

  • HADOOP-8618: Fixed build failures caused due to merging of the Hadoop v1.0.3 branch

  • HADOOP-8657: Fixed TestCLI to remove hardcoded value for the file length

  • HADOOP-8664: Fixed the Hadoop streaming job issue that required the user to provide full path to commands

  • HADOOP-8731: Added public distributed cache support for Windows. This fixes the failures for TestTrackerDistributedCacheManager

  • HADOOP-8732: Fixed test failures caused due to incorrect process serialization on Windows

  • HADOOP-8733: Fixed the failures caused by the dependencies in the test.sh script file

  • HADOOP-8734: Fixed LocalJobRunner to support private distributed cache

  • HADOOP-8739: Fixed command line parsing on Windows

  • HADOOP-8763: Fixed issues caused when setting group owner on Windows

  • HADOOP-8820: Backport HADOOP-8469 and HADOOP-8470: Make NetworkTopology class pluggable and add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology

  • HADOOP-8634: Fixed the errors caused when FileSystem deleteonExit method is invoked

  • HADOOP-8836: UGI should throw exception in case winutils.exe cannot be loaded

  • HADOOP-8645: HADOOP_HOME and -Dhadoop.home (from hadoop wrapper script) are not uniformly handled

  • HADOOP-8694: Added symlink support to Windows platform

  • HADOOP-8847: Change untar to use Java API on Windows instead of spawning tar process

  • HADOOP-8868: FileUtil#chmod should normalize the path before calling into shell APIs

  • HADOOP-8872: Fixed issue caused while invoking FileSystem.length() method on a Windows machine using JDK 6.x

  • HADOOP-8874: Added an API to retrieve valid HADOOP_HOME and bin path. This patch adds a consistency layer for HADOOP_HOME lookups and provides abstractions to qualify bin paths of hadoop binary components

  • HADOOP-8880: Fixed Hive test failures caused because of missing Jersey JAR files in the POM template

  • HADOOP-8899: Fixed issues caused because of Classpath exceeding the maximum operating system (OS) limit

  • HADOOP-8900: BuiltInGzipDecompressor throws IOException - stored gzip size doesn't match decompressed size

  • HADOOP-8902: Enabled Gridmix v1 and v2 benchmarks on the Windows platform

  • HADOOP-8903: Added support for HADOOP_USER_CLASSPATH_FIRST environment variable in the hadoop.cmd file

  • HADOOP-8907: Provide means to look for zlib1.dll next to hadoop.dll on Windows

  • HADOOP-8908: Refactor winutil.exe related code

  • HADOOP-8911: CRLF characters in source and text files

  • HADOOP-8912: Add .gitattributes file to prevent CRLF and LF mismatches for source and text files

  • HADOOP-8935: Improved Winutils to handle the failures caused for the winutils ls command

  • HADOOP-8972: Move winutils tests from bat to Java

  • HADOOP-9026: hadoop.cmd fails to initialize if user's %path% variable has parenthesis

  • HADOOP-9027: Build fails on Windows without sh/sed/echo in the path

  • HADOOP-9036: Fix racy test case TestSinkQueue

  • HADOOP-9040: Added fixes for TaskController

  • HADOOP-9061: Java6+Windows does not work well with symlinks

  • HADOOP-9062: hadoop-env.cmd overwrites the value of *_OPTS set before install

  • HADOOP-9071: Improved Ivy log levels

  • HADOOP-9074: Hadoop install scripts for Windows

  • HADOOP-9090: Support on-demand publish of metrics

  • HADOOP-9095: Backport HADOOP-8372: NetUtils.normalizeHostName() incorrectly handles hostname starting with a numeric character

  • HADOOP-9099: TestNetUtils fails if "UnknownHost" is resolved as a valid hostname

  • HADOOP-9102: winutils task isAlive does not return a non-zero exit code if the requested task is not alive

  • HADOOP-9110: winutils ls off-by-one error indexing MONTHS array can cause access violation

  • HADOOP-9174: TestSecurityUtil fails with Open JDK 7

  • HADOOP-9175: TestWritableName fails with Open JDK 7

  • HADOOP-9177: Address issues that reported by static code analysis on winutils

  • HADOOP-9179: TestFileSystem fails with open JDK7

  • HADOOP-9185: TestFileCreation.testFsClose should clean up on exit

  • HADOOP-9191: TestAccessControlList and TestJobHistoryConfig fail with JDK7

  • HADOOP-9111: Change some JUnit 3 tests to JUnit 4 so that @Ignore tests can be run with Ant v1.8.x

  • HADOOP-9250: Windows installer bugfixes

  • HADOOP-9660: [WINDOWS] Powershell / cmd parses -Dkey=value from command line as [-Dkey, value] which breaks GenericOptionsParser

  • HADOOP-10093: hadoop-env.cmd sets HADOOP_CLIENT_OPTS with a max heap size that is too small

  • HADOOP-10094: NPE in GenericOptionsParser#preProcessForWindows()

  • MAPREDUCE-782: Use PureJavaCrc32 in MapReduce spills

  • MAPREDUCE-1806 (HADOOP-136): Fixed issues in CombineFileInputFormat that caused failure while using Sqoop to export files in ASV

  • MAPREDUCE-4201: Fixed issues related to obtaining PIDs on Windows

  • MAPREDUCE-4203: Added an implementation of the process tree for Windows

  • MAPREDUCE-4204: Improved ProcfsBasedProcessTree to enable the resource collection object to be pluggable

  • MAPREDUCE-4260: Added support to use JobObject for spawning tasks on Windows platform

  • MAPREDUCE-4321: Fixed failures for DefaultTaskController on Windows

  • MAPREDUCE-4332: Fixed command length abort issues on Windows

  • MAPREDUCE-4368: Fixed TaskRunner to handle the event when java.library.path contains a quoted path with embedded spaces on Windows platform

  • MAPREDUCE-4369: Fixed streaming job failures with WindowsResourceCalculatorPlugin

  • MAPREDUCE-4374: Added support for configurable environment for child map/reduce tasks on Windows

  • MAPREDUCE-4400: Fixed performance regression for small jobs and workflows

  • MAPREDUCE-4510: Fixed redundant checks and logging of getconf on Windows

  • MAPREDUCE-4561: Added support for node health scripts on Windows

  • MAPREDUCE-4564: Fixed shell timeout mechanism. This fix enables successful termination of those processes that are spawned by Winutils

  • MAPREDUCE-4597: Fixed intermittent failures for TestKillSubProcesses

  • MAPREDUCE-4598: Added support for node health scripts on Windows

  • MAPREDUCE-4657: WindowsResourceCalculatorPlugin has Null Pointer Exception.

  • MAPREDUCE-4909: TestKeyValueTextInputFormat fails with Open JDK 7 on Windows

  • MAPREDUCE-4914: TestMiniMRDFSSort fails with openJDK7

  • MAPREDUCE-4915: TestShuffleExceptionCount fails with open JDK7

  • MAPREDUCE-5451: MR uses LD_LIBRARY_PATH which doesn't mean anything in Windows

  • MAPREDUCE-5604: TesMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

  • MAPREDUCE-5616: MR Client-AppMaster RPC max retries on socket timeout is too high

  • HDFS-385: Added new experimental API BlockPlacementPolicy allows investigating alternate rules for locating block replicas

  • HDFS-496: Backport: Use PureJavaCrc32 in HDFS

  • HDFS-3163: Fixed failures for TestHDFSCLI.testAll which occurred when the user name was not provided in lowercase

  • HDFS-3424: Fixed TestDatanodeBlockScanner and TestReplication failures on Windows

  • HDFS-3564: Added enhancements to the block placement policy. This enhancement enables a pluggable placement policy and provides a new API that enables moving blocks for balancing. It also enables the placement policy to decide the number of racks and provides ability to extend the policy

  • HDFS-3566: Add AzureBlockPlacementPolicy to handle fault and upgrade domains in Azure. This policy distributes replicas across both the fault and the upgrade domains to ensure zero data loss

  • HDFS-3763: Fixed the TestNameNodeMXBean failures on Windows

  • HDFS-3766: Fixed TestStorageRestore on Windows

  • HDFS-3833: Fixed TestDFSShell failures on Windows caused due to concurrent file read/write

  • HDFS-3941: Backport HDFS-3498 and HDFS-3601: Support replica removal in BlockPlacementPolicy and make BlockPlacementPolicyDefault extensible for reusing code in subclasses, and add BlockPlacementPolicyWithNodeGroup to support block placement with 4-layer network topology

  • HDFS-3942: Backport HDFS-3495 and HDFS-4234: Update Balancer to support new NetworkTopology with NodeGroup and use generic code for choosing DataNode in Balancer

  • HDFS-4065: TestDFSShell.testGet sporadically fails attempting to corrupt block files due to race condition

  • HDFS-4320: Add a separate configuration for NameNode RPC address instead of using fs.default.name

  • HDFS-4337: Backport HDFS-4240: For nodegroup-aware block placement, when a node is excluded, the nodes in the same nodegroup should also be excluded

  • HDFS-4341: Set default data dir permission in MiniDFSClusterWithNodeGroup

  • HDFS-4355: TestNameNodeMetrics.testCorruptBlock fails with open JDK7

  • HDFS-4358: TestCheckpoint failure with JDK7

  • HDFS-4413: Secondary namenode won't start if HDFS isn't the default file system

  • HDFS-4633: TestDFSClientExcludedNodes fails sporadically if excluded nodes cache expires too quickly

  • HDFS-5065: TestSymlinkHdfsDisable fails on Windows

  • HDFS-5089: When a LayoutVersion supports SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION

  • HDFS-5338: Add a conf to disable hostname check in DN registration

  • HDFS-5375: hdfs.cmd does not expose several snapshop commands

  • HDFS-5413: hdfs.cmd does not support passthrough to any arbitrary class

  • HDFS-5432: TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name

  • HDFS-5456: NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist

  • HDFS-6527: Backport HADOOP-7389: Use of TestingGroups by tests causes subsequent tests to fail

  • YARN-1331: yarn.cmd exits with NoClassDefFoundError trying to run rmadmin or logs

  • YARN:1357: TestContainerLaunch.testContainerEnvVariables fails on Windows

  • YARN-1358: TestYarnCLI fails on Windows due to line endings

  • YARN-1349: yarn.cmd does not support passthrough to any arbitrary class

  • YARN-1395: Distributed shell application master launched with debug flag can hang waiting for external ls process

  • BUG-8178: Datanodes fail to register with namenode due to minimum version check