HDP-2.3.6 Release Notes
Also available as:
PDF

Hadoop

In addition to any Apache patches ported over from the 2.4.x core, HDP 2.3.6 provides the following Apache patches:

  • HADOOP-11252: RPC client does not time out by default.

  • HADOOP-11785: Reduce the number of listStatus operation in distcp buildListing.

  • HADOOP-11827: Speed-up distcp buildListing() using threadpool.

  • HADOOP-11876: Refactor code to make it more readable, minor maybePrintStats bug.

  • HADOOP-11901: BytesWritable fails to support 2G chunks due to integer overflow.

  • HADOOP-12423: Handle failure of registering shutdownhook by ShutdownHookManager in static block.

  • HADOOP-12559: KMS connection failures should trigger TGT renewal.

  • HADOOP-12672: RPC timeout should not override IPC ping interval.

  • HADOOP-12716: KerberosAuthenticator#doSpnegoSequence use incorrect class to determine isKeyTab in JDK8.

  • HADOOP-12847: Hadoop daemonlog should support HTTPS and SPNEGO for Kerberized cluster.

  • HADOOP-12950: ShutdownHookManager should have a timeout for each of the Registered shutdown hook.

  • HADOOP-12984: Add GenericTestUtils.getTestDir method and use it for temporary directory in tests.

  • HADOOP-12993: Change ShutdownHookManger complete shutdown log from INFO to DEBUG.

  • HADOOP-13008: Add XFS Filter for UIs to Hadoop Common.

  • HADOOP-13039: Add documentation for configuration property ipc.maximum.data.length.

  • HADOOP-13098: Dynamic LogLevel setting page should accept case-insensitive log level string.

  • HADOOP-13103: Group resolution from LDAP may fail on javax.naming.ServiceUnavailableException.

  • HADOOP-1540: breaks backward compatibility.

  • HADOOP-1540: Support file exclusion list in distcp.

  • HDFS-10178: Permanent write failures can happen if pipeline recoveries occur for the first packet.

  • HDFS-10216: Distcp -diff throws exception when handling relative path.

  • HDFS-10223: peerFromSocketAndKey performs SASL exchange before setting connection timeouts.

  • HDFS-10312: Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction.

  • HDFS-10313: Distcp need to enforce the order of snapshot names passed to -diff.

  • HDFS-10335: Mover$Processor#chooseTarget() always chooses the first matching target storage group.

  • HDFS-10341: Add a metric to expose the timeout number of pending replication blocks.

  • HDFS-10347: NameNode report bad block method doesn't log the bad block or datanode.

  • HDFS-10397: Distcp should ignore -delete option if -diff option is provided instead of exiting.

  • HDFS-10424: DatanodeLifelineProtocol not able to use under security cluster.

  • HDFS-10438: When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.

  • HDFS-6951: Correctly persist raw namespace xattrs to edit log and fsimage.

  • HDFS-7163: WebHdfsFileSystem should retry reads according to the configured retry policy.

  • HDFS-7916: 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop.

  • HDFS-8512: WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info.

  • HDFS-8828: Utilize Snapshot diff report to build diff copy list in distcp.

  • HDFS-8887: Expose storage type and storage ID in BlockLocation.

  • HDFS-9198: Coalesce IBR processing in the NN.

  • HDFS-9476: TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail.

  • HDFS-9612: DistCp worker threads are not terminated after jobs are done.

  • HDFS-9764: DistCp doesn't print value for several arguments including -numListstatusThreads.

  • HDFS-9902: Support different values of dfs.datanode.du.reserved per storage type.

  • HDFS-9917: IBR accumulate more objects when SNN was down for sometime.

  • HDFS-9958: BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

  • MAPREDUCE-6514: Fixed MapReduce ApplicationMaster to properly updated resources ask after ramping down of all reducers avoiding job hangs.

  • MAPREDUCE-6689: MapReduce job can infinitely increase number of reducer resource requests.

  • YARN-3021: YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp.

  • YARN-3602: TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails intermittently due to IOException from cleanup.

  • YARN-3846: RM Web UI queue filter is not working for sub queue.

  • YARN-4596: SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities.

  • YARN-4717: TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup.

  • YARN-4820: ResourceManager web redirects in HA mode drops query parameters.

  • YARN-4844: Scheduler keeps skipping scheduling because large amount of resources asked which causes pending resource overflows.

  • YARN-5048: DelegationTokenRenewer#skipTokenRenewal may throw NPE.

  • YARN-5076: YARN web interfaces lack XFS (Cross-Frame Script) protection.

  • YARN-5112: Excessive log warnings for directory permission issue on NM recovery.

HDP 2.3.6 ports the following Apache patches from the 2.4.x core:

  • HADOOP-10365: BufferedOutputStream in FileUtil#unpackEntries() should be closed in finally block.

  • HADOOP-10406: TestIPC.testIpcWithReaderQueuing may fail.

  • HADOOP-11212: NetUtils.wrapException to handle SocketException explicitly.

  • HADOOP-12100: ImmutableFsPermission should not override applyUmask since that method doesn't modify the FsPermission.

  • HADOOP-12103: Small refactoring of DelegationTokenAuthenticationFilter to allow code sharing.

  • HADOOP-12107: Long running apps may have a huge number of StatisticsData instances under FileSystem.

  • HADOOP-12161: Add getStoragePolicy API to the FileSystem interface.

  • HADOOP-12191: Bzip2Factory is not thread safe.

  • HADOOP-12213: Interrupted exception can occur when Client#stop is called.

  • HADOOP-12348: MetricsSystemImpl creates MetricsSourceAdapter with wrong time unit parameter.

  • HADOOP-12374: Description of HDFS expunge command is confusing.

  • HADOOP-12426: Add Entry point for Kerberos health check.

  • HADOOP-12464: Interrupted client may try to fail-over and retry Interrupted client may try to fail-over and retry.

  • HADOOP-12482: Race condition in JMX cache update.

  • HADOOP-12551: Introduce FileNotFoundException for WASB FileSystem API.

  • HADOOP-12589: Fix intermittent test failure of TestCopyPreserveFlag.

  • HADOOP-12608: Fix exception message in WASB when connecting with anonymous credential.

  • HADOOP-12609: Fix intermittent failure of TestDecayRpcScheduler.

  • HADOOP-12678: Handle empty rename pending metadata file during atomic rename in redo path.

  • HADOOP-12699: TestKMS#testKMSProvider intermittently fails during 'test rollover draining'.

  • HADOOP-12706: TestLocalFsFCStatistics#testStatisticsThreadLocalDataCleanUp times out occasionally.

  • HADOOP-12752: Improve diagnostics/use of envvar/sysprop credential propagation.

  • HADOOP-12787: KMS SPNEGO sequence does not work with WEBHDFS.

  • HADOOP-12795: KMS does not log detailed stack trace for unexpected errors.

  • HADOOP-12825: Log slow name resolutions.

  • HADOOP-12829: StatisticsDataReferenceCleaner swallows interrupt exceptions.

  • HADOOP-12851: S3AFileSystem Uptake of ProviderUtils.excludeIncompatibleCredentialProviders.

  • HADOOP-12903: IPC Server should allow suppressing exception logging by type, not log 'server too busy' messages.

  • HADOOP-12958: PhantomReference for filesystem statistics can trigger OOM.

  • HADOOP-13026: Should not wrap IOExceptions into a AuthenticationException in KerberosAuthenticator.

  • HDFS-10199: Unit tests TestCopyFiles, TestDistCh, TestLogalyzer under org.apache.hadoop.tools are failing.

  • HDFS-10270: TestJMXGet:testNameNode() fails.

  • HDFS-10281: TestPendingCorruptDnMessages fails intermittently.

  • HDFS-10283: o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently.

  • HDFS-6101: TestReplaceDatanodeOnFailure fails occasionally.

  • HDFS-8113: Add check for null BlockCollection pointers in BlockInfoContiguous structures.

  • HDFS-8337: Accessing HttpFS via WebHDFS doesn't work from a jar with Kerberos.

  • HDFS-8647: Abstract BlockManager's rack policy into BlockPlacementPolicy.

  • HDFS-8659: Block scanner INFO message is spamming logs.

  • HDFS-8676: Delayed rolling upgrade finalization can cause heartbeat expiration.

  • HDFS-8729: Fix TestFileTruncate#testTruncateWithDataNodesRestartImmediately which occasionally failed.

  • HDFS-8772: Fix TestStandbyIsHot#testDatanodeRestarts which occasionally fails.

  • HDFS-8806: Inconsistent metrics: number of missing blocks with replication factor 1 not properly cleared.

  • HDFS-8815: DFS getStoragePolicy implementation using single RPC call.

  • HDFS-8891: HDFS concat should keep srcs order.

  • HDFS-9072: Fix random failures in TestJMXGet.

  • HDFS-9130: Use GenericTestUtils#setLogLevel to the logging level.

  • HDFS-9221: HdfsServerConstants#ReplicaState#getState should avoid calling values() since it creates a temporary array.

  • HDFS-9239: DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness.

  • HDFS-9289: Make DataStreamer#block thread safe and verify genStamp in commitBlock.

  • HDFS-9290: DFSClient#callAppend() is not backward compatible for slightly older NameNodes.

  • HDFS-9313: Possible NullPointerException in BlockManager if no excess replica can be chosen.

  • HDFS-9314: Improve BlockPlacementPolicyDefault's picking of excess replicas.

  • HDFS-9347: Invariant assumption in TestQuorumJournalManager.shutdown() is wrong.

  • HDFS-9358: TestNodeCount#testNodeCount timed out.

  • HDFS-9383: TestByteArrayManager#testByteArrayManager fails.

  • HDFS-9402: Switch DataNode.LOG to use slf4j.

  • HDFS-9406: FSImage may get corrupted after deleting snapshot.

  • HDFS-9431: DistributedFileSystem#concat fails if the target path is relative.

  • HDFS-9434: Recommission a datanode with 500k blocks may pause NN for 30 seconds for printing info log messages.

  • HDFS-9445: Datanode may deadlock while handling a bad volume.

  • HDFS-9478: Reason for failing ipc.FairCallQueue construction should be thrown.

  • HDFS-9534: Add CLI command to clear storage policy from a path.

  • HDFS-9557: Reduce object allocation in PB conversion.

  • HDFS-9572: Prevent DataNode log spam if a client connects on the data transfer port but sends no data.

  • HDFS-9574: Reduce client failures during datanode restart.

  • HDFS-9600: Do not check replication if the block is under construction.

  • HDFS-9625: Set replication for empty file failed when set storage policy.

  • HDFS-9655: NN should start JVM pause monitor before loading fsimage.

  • HDFS-9661: Deadlock in DN.FsDatasetImpl between moveBlockAcrossStorage and createRbw.

  • HDFS-9672: o.a.h.hdfs.TestLeaseRecovery2 fails intermittently.

  • HDFS-9710: DN can be configured to send block receipt IBRs in batches.

  • HDFS-9724: Degraded performance in WebHDFS listing as it does not reuse ObjectMapper.

  • HDFS-9726: Refactor IBR code to a new class.

  • HDFS-9740: Use a reasonable limit in DFSTestUtil.waitForMetric().

  • HDFS-9743: Fix TestLazyPersistFiles#testFallbackToDiskFull in branch-2.7.

  • HDFS-9752: Permanent write failures may happen to slow writers during datanode rolling upgrades.

  • HDFS-9768: Reuse ObjectMapper instance in HDFS to improve the performance.

  • HDFS-9790: HDFS Balancer should exit with a proper message if upgrade is not finalized.

  • HDFS-9839: Reduce verbosity of processReport logging.

  • HDFS-9851: NameNode throws NPE when setPermission is called on a path that does not exist.

  • HDFS-9854: Log cipher suite negotiation more verbosely.

  • HDFS-9906: Remove spammy log spew when a datanode is restarted.

  • HDFS-9941: Do not log StandbyException on NN, other minor logging fixes.

  • MAPREDUCE-6436: JobHistory cache issue.

  • MAPREDUCE-6460: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails.

  • MAPREDUCE-6492: AsyncDispatcher exit with NPE on TaskAttemptImpl#sendJHStartEventForAssignedFailTask.

  • MAPREDUCE-6566: Add retry support to MapReduce CLI tool.

  • MAPREDUCE-6577: MR AM unable to load native library without MR_AM_ADMIN_USER_ENV set.

  • MAPREDUCE-6618: YarnClientProtocolProvider leaking the YarnClient thread.

  • MAPREDUCE-6621: Memory Link in JobClient#submitJobInternal().

  • MAPREDUCE-6635: Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException.

  • MAPREDUCE-6670: TestJobListCache#testEviction sometimes fails on Windows with timeout.

  • MAPREDUCE-6680: JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS.

  • YARN-2046: Out of band heartbeats are sent only on container kill and possibly too early.

  • YARN-2871: TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk.

  • YARN-3102: Decommissioned Nodes not listed in Web UI.

  • YARN-3480: Remove attempts that are beyond max-attempt limit from state store.

  • YARN-3695: ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.

  • YARN-3769: Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler.

  • YARN-4155: TestLogAggregationService.testLogAggregationServiceWithInterval failing.

  • YARN-4309: Add container launch related debug information to container logs when a container fails.

  • YARN-4365: FileSystemNodeLabelStore should check for root dir existence on startup.

  • YARN-4414: Nodemanager connection errors are retried at multiple levels.

  • YARN-4422: Generic AHS sometimes doesn't show started, node, or logs on App page.

  • YARN-4428: Redirect RM page to AHS page when AHS turned on and RM page is not available.

  • YARN-4439: Clarify NMContainerStatus#toString method.

  • YARN-4497: RM might fail to restart when recovering apps whose attempts are missing.

  • YARN-4546: ResourceManager crash due to scheduling opportunity overflow.

  • YARN-4565: Sometimes when sizeBasedWeight FairOrderingPolicy is enabled, under stress appears that cluster is virtually in deadlock.

  • YARN-4584: RM startup failure when AM attempts greater than max-attempts.

  • YARN-4598: Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL.

  • YARN-4610: Reservations continue looking for one app causes other apps to starve.

  • YARN-4623: TestSystemMetricsPublisher#testPublishAppAttemptMetricsForUnmanagedAM fails with NPE on branch-2.7.

  • YARN-4625: ApplicationSubmissionContext and ApplicationSubmissionContextInfo more consistent.

  • YARN-4633: Fix random test failure in TestRMRestart#testRMRestartAfterPreemption.

  • YARN-4680: Fix TimerTasks leak in Application Timeline Server (ATS) v1.5 Writer.

  • YARN-4696: TimelineClient to add flush operation for deterministic writes (including testing) and Changes to EntityGroupFSTimelineStore for testability.

  • YARN-4709: NMWebServices produces incorrect JSON for containers.

  • YARN-4723: NodesListManager$UnknownNodeId ClassCastException.

  • YARN-4737: Add CSRF filter support in YARN.

  • YARN-4769: Add support for CSRF header in the dump capacity scheduler logs and kill app buttons in RM web UI.

  • YARN-4785: Inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API.

  • YARN-4814: ATS 1.5 timelineclient impl call flush after every event write.

  • YARN-4815: ATS 1.5 timelineclinet impl try to create attempt directory for every event call.

  • YARN-4817: TimelineClient ATSv1.5 logging is very noisy.

  • YARN-4916: TestNMProxy.tesNMProxyRPCRetry fails.

  • YARN-4928: Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon.

  • YARN-4954: TestYarnClient.testReservationAPIs fails on machines with less than 4 GB available memory.

  • YARN-4955: Add retry for SocketTimeoutException in TimelineClient.

  • YARN-4965: Distributed shell AM failed due to ClientHandlerException thrown by jersey.

  • YARN-4968: Fix two scheduler related UTs in YARN.

HDP 2.3.4.7 provided no additional Apache patches.

HDP 2.3.4 provided the following Apache patches:

  • HADOOP-11098: [JDK8] Max Non Heap Memory default changed between JDK7 and 8.

  • HADOOP-11628: SPNEGO auth does not work with CNAMEs in JDK8.

  • HADOOP-11685: StorageException complaining "no lease ID" during HBase distributed log splitting.

  • HADOOP-11918: Listing an empty s3a root directory throws FileNotFound.

  • HADOOP-11932: MetricsSinkAdapter may hang when being stopped.

  • HADOOP-12049 Control http authentication cookie persistence via configuration.

  • HADOOP-12089: StorageException complaining " no lease ID" when updating FolderLastModifiedTime in WASB.

  • HADOOP-12186 ActiveStandbyElector shouldn't call monitorLockNodeAsync multiple times.

  • HADOOP-12239: StorageException complaining " no lease ID" when updating FolderLastModifiedTime in WASB.

  • HADOOP-12324: Better exception reporting in SaslPlainServer.

  • HADOOP-12334: Change Mode Of Copy Operation of HBase WAL Archiving to bypass Azure Storage Throttling after retries.

  • HADOOP-12350: WASB Logging; Improve WASB Logging around deletes, reads, and writes.

  • HADOOP-12350 WASB Logging: Improve WASB Logging around deletes, reads and writes.

  • HADOOP-12407: Test failing; hadoop.ipc.TestSaslRPC.

  • HADOOP-12413: AccessControlList should avoid calling getGroupNames in isUserInList with empty groups.

  • HADOOP-12437 Allow SecurityUtil to lookup alternate hostnames.

  • HADOOP-12438: TestLocalFileSystem tests can fail on Windows after HDFS-8767 fix for handling pipe.

  • HADOOP-12440: TestRPC#testRPCServerShutdown did not produce the desired thread states before shutting down.

  • HADOOP-12441: Fixed kill-command behavior to work correctly across OSes by using bash shell built-in.

  • HADOOP-12463 Fix TestShell.testGetSignalKillCommand failure on windows.

  • HADOOP-12484: Single File Rename Throws Incorrectly In Potential Race Condition Scenarios.

  • HADOOP-12508: delete fails with exception when lease is held on blob.

  • HADOOP-12533: Introduce FileNotFoundException in WASB for read and seek API.

  • HADOOP-12540: TestAzureFileSystemInstrumentation#testClientErrorMetrics fails intermittently due to assumption that a lease error will be thrown.

  • HADOOP-12542: TestDNS fails on Windows after HADOOP-12437.

  • HADOOP-12577 Bump up commons-collections version to 3.2.2 to address a security flaw.

  • HADOOP-12617 SPNEGO authentication request to non-default realm gets default realm name inserted in target server principal.

  • HBASE-268 Rack locality improvement.

  • HDFS-4015: Safemode should count and report orphaned blocks.

  • HDFS-4015 Safemode should count and report orphaned blocks.

  • HDFS-4366 Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks.

  • HDFS-4937 ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom.

  • HDFS-6481 DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs.

  • HDFS-6581 Support for writing to single replica in RAM.

  • HDFS-7390 Provide JMX metrics per storage type.

  • HDFS-7483 Display information per tier on the NameNode UI.

  • HDFS-7725 Incorrect "nodes in service" metrics caused all writes to fail.

  • HDFS-7858 Improve HA NameNode Failover detection on the client.

  • HDFS-7928: Scanning blocks from disk during rolling upgrade startup takes a lot of time if disks are busy.

  • HDFS-7928 Scanning blocks from disk during rolling upgrade startup takes a lot of time if disks are busy.

  • HDFS-8099 Change "DFSInputStream has been closed already" message to debug log level.

  • HDFS-8209 Support different number of datanode directories in MiniDFSCluster.

  • HDFS-8554: TestDatanodeLayoutUpgrade fails on Windows.

  • HDFS-8656: Preserve compatibility of ClientProtocol#rollingUpgrade after finalization.

  • HDFS-8696: Make the lower and higher watermark in the DN Netty server configurable.

  • HDFS-8778: TestBlockReportRateLimiting#testLeaseExpiration can deadlock.

  • HDFS-8785 TestDistributedFileSystem is failing in trunk.

  • HDFS-8809: HDFS fsck reports under construction blocks as CORRUPT.

  • HDFS-8829 Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning.

  • HDFS-8846: Add a unit test for INotify functionality across a layout version upgrade.

  • HDFS-8855: Webhdfs client leaks active NameNode connections.

  • HDFS-8930: Block report lease may leak if the 2nd full block report comes when NN is still in safemode.

  • HDFS-8950 NameNode refresh doesn't remove DataNodes that are no longer in the allowed list.

  • HDFS-8965: Harden edit log reading code against out of memory errors.

  • HDFS-8965 Harden edit log reading code against out of memory errors.

  • HDFS-8969: Clean up findbugs warnings for HDFS-8823 and HDFS-8932.

  • HDFS-9008: Balancer#Parameters class could use a builder pattern.

  • HDFS-9019: Adding informative message to sticky bit permission denied exception.

  • HDFS-9063: Correctly handle snapshot path for getContentSummary.

  • HDFS-9082: Change the log level in WebHdfsFileSystem.initialize() from INFO to DEBUG.

  • HDFS-9083: Replication violates block placement policy.

  • HDFS-9107: Prevent NNs unrecoverable death spiral after full GC.

  • HDFS-9112: Improve error message for Haadmin when multiple name service IDs are configured.

  • HDFS-9112 Improve error message for Haadmin when multiple name service IDs are configured,

  • HDFS-9128: TestWebHdfsFileContextMainOperations and TestSWebHdfsFileContextMainOperations fail due to invalid HDFS path on Windows.

  • HDFS-9142: Separating Configuration object for NameNodes in MiniDFSCluster.

  • HDFS-9175: Change scope of 'AccessTokenProvider.getAccessToken()' and 'CredentialBasedAccessTokenProvider.getCredential()' abstract methods to public.

  • HDFS-9178 Slow datanode I/O can cause a wrong node to be marked bad.

  • HDFS-9184 Logging HDFS operation's caller context into audit logs.

  • HDFS-9205: Do not schedule corrupt blocks for replication.

  • HDFS-9220: Reading small file greater than 512 bytes that is open for append fails due to incorrect checksum.

  • HDFS-9273: ACLs on root directory may be lost after NN restart.

  • HDFS-9294: DFSClient deadlock when close file and failed to renew lease.

  • HDFS-9305 Delayed heartbeat processing causes Storm of subsequent heartbeats.

  • HDFS-9311: Support optional offload of NameNode HA service health checks to a separate RPC server.

  • HDFS-9343 Empty caller context considered invalid.

  • HDFS-9354: Fix TestBalancer#testBalancerWithZeroThreadsForMove on Windows.

  • HDFS-9362: TestAuditLogger#testAuditLoggerWithCallContext assumes Unix line endings, fails on Windows.

  • HDFS-9364 Unnecessary DNS resolution attempts when creating NameNodeProxies.

  • HDFS-9384: TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

  • HDFS-9397 Fix typo for readChecksum() LOG.warn in BlockSender.java.

  • HDFS-9413: getContentSummary() on standby should throw StandbyException.

  • HDFS-9426: Rollingupgrade finalization is not backward compatible.

  • HDFS-9434: Recommission a datanode with 500k blocks may pause NN for 30 seconds for printing info log messages.

  • MAPREDUCE-5485 Allow repeating job commit by extending OutputCommitter API.

  • MAPREDUCE-6273 HistoryFileManager should check whether summaryFile exists to avoid FileNotFoundException causing HistoryFileInfo into MOVE_FAILED state.

  • MAPREDUCE-6302 Backport preempt reducers after a configurable timeout irrespective of headroom.

  • MAPREDUCE-6549 Multibyte delimiters with LineRecordReader cause duplicate records.

  • YARN-2194 Fix bug causing CGroups functionality to fail on RHEL7.

  • YARN-2571 RM to support YARN registry.

  • YARN-3467 Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI.

  • YARN-3600 AM container link is broken (on a killed application, at least).

  • YARN-3727 For better error recovery, check if the directory exists before using it for localization.

  • YARN-3751 Fixed AppInfo to check if used resources are null.

  • YARN-3766 Fixed the apps table column error of generic history web UI.

  • YARN-3849 Too much of preemption activity causing continuous killing of containers across queues.

  • YARN-4140 RM container allocation delayed in case of app submitted to Nodelabel partition.

  • YARN-4233 YARN Timeline Service plugin: ATS v1.5.

  • YARN-4285 Display resource usage as percentage of queue and cluster in the RM UI.

  • YARN-4287 Rack locality improvement.

  • YARN-4288 Fixed RMProxy to retry on IOException from local host.

  • YARN-4313 Race condition in MiniMRYarnCluster when getting history server address.

  • YARN-4345 YARN rmadmin -updateNodeResource doesn't work.

  • YARN-4347 Resource manager fails with Null pointer exception.

  • YARN-4349 YARN_APPLICATION call to ATS does not have YARN_APPLICATION_CALLER_CONTEXT.

  • YARN-4384 updateNodeResource CLI should not accept negative values for resource.

  • YARN-4405 Support node label store in non-appendable file system.

HDP 2.3.2 provided the following Apache patches:

NEW FEATURES

IMPROVEMENTS

  • HADOOP-10597 RPC Server signals backoff to clients when all request queues are full.

  • HADOOP-11960 Enable Azure-Storage Client Side logging.

  • HADOOP-12325 RPC Metrics: Add the ability track and log slow RPCs.

  • HADOOP-12358 Add -safely flag to rm to prompt when deleting many files.

  • HDFS-4185 Add a metric for number of active leases.

  • HDFS-4396 Add START_MSG/SHUTDOWN_MSG for ZKFC.

  • HDFS-6860 BlockStateChange logs are too noisy.

  • HDFS-7923 The DataNodes should rate-limit their full block reports byasking the NN on heartbeat messages.

  • HDFS-8046 Allow better control of getContentSummary.

  • HDFS-8180 AbstractFileSystem Implementation for WebHdfs.

  • HDFS-8278 When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size.

  • HDFS-8432 Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases.

  • HDFS-8435 Support CreateFlag in WebHDFS.

  • HDFS-8549 Abort the balancer if an upgrade is in progress.

  • HDFS-8797 WebHdfsFileSystem creates too many connections for pread.

  • HDFS-8818 Changes the global moveExecutor to per datanode executors and changes MAX_SIZE_TO_MOVE to be configurable.

  • HDFS-8824 Do not use small blocks for balancing the cluster.

  • HDFS-8826 In Balancer, add an option to specify the source node list so that balancer only selects blocks to move from those nodes.

  • HDFS-8883 NameNode Metrics: Add FSNameSystem lock Queue Length.

  • HDFS-8911 NameNode Metric Add Editlog counters as a JMX metric.

  • HDFS-8983 NameNode support for protected directories.

  • HDFS-8983 NameNode support for protected directories.

  • YARN-2513 Host framework UIs in YARN for use with the ATS.

  • YARN-3197 Confusing log generated by CapacityScheduler.

  • YARN-3357 Move TestFifoScheduler to FIFO package.

  • YARN-3360 Add JMX metrics to TimelineDataManager.

  • YARN-3579 CommonNodeLabelsManager should support NodeLabel instead of string label name when getting node-to-label/label-to-label mappings.

  • YARN-3978 Configurably turn off the saving of container info in Generic AHS.

  • YARN-4082 Container shouldn't be killed when node's label updated.

  • YARN-4101 RM should print alert messages if ZooKeeper and Resourcemanager gets connection issue.

  • YARN-4149 YARN logs -am should provide an option to fetch all the log files.

BUG FIXES

  • HADOOP-11802 DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm.

  • HADOOP-12052 IPC client downgrades all exception types to IOE, breaks callers trying to use them.

  • HADOOP-12073 Azure FileSystem PageBlobInputStream does not return -1 onEOF.

  • HADOOP-12095 org.apache.hadoop.fs.shell.TestCount fails.

  • HADOOP-12304 Applications using FileContext fail with the default filesystem configured to be wasb/s3/etc.

  • HADOOP-8151 Error handling in Snappy decompressor throws invalidexceptions.

  • HDFS-6945 BlockManager should remove a block from excessReplicateMap anddecrement ExcessBlocks metric when the block is removed.

  • HDFS-7608 HDFS dfsclient newConnectedPeer has nowrite timeout.

  • HDFS-7609 Avoid retry cache collision when Standby NameNode loading edits.

  • HDFS-8309 Skip unit test using DataNodeTestUtils#injectDataDirFailure() on Windows.

  • HDFS-8310 Fix TestCLI.testAll "help for find" on Windows.

  • HDFS-8311 DataStreamer.transfer() should timeout the socket InputStream.

  • HDFS-8384 Allow NN to startup if there are files having a lease but are notunder construction.

  • HDFS-8431 HDFS crypto class not found in Windows.

  • HDFS-8539 HDFS doesn’t have class 'debug' in windows.

  • HDFS-8542 WebHDFS getHomeDirectory behavior does not match specification.

  • HDFS-8593 Calculation of effective layout version mishandles comparison to current layout version in storage.

  • HDFS-8767 RawLocalFileSystem.listStatus() returns null for UNIX pipefile.

  • HDFS-8850 VolumeScanner thread exits with exception if there is no blockpool to be scanned but there are suspicious blocks.

  • HDFS-8863 The remaining space check in BlockPlacementPolicyDefault is flawed.

  • HDFS-8879 Quota by storage type usage incorrectly initialized upon namenoderestart.

  • HDFS-8885 ByteRangeInputStream used in webhdfs does not overrideavailable().

  • HDFS-8932 NPE thrown in NameNode when try to get TotalSyncCount metricbefore editLogStream initialization.

  • HDFS-8939 Test(S)WebHdfsFileContextMainOperations failing on branch-2.

  • HDFS-8969 Clean up findbugs warnings for HDFS-8823 and HDFS-8932.

  • HDFS-8995 Flaw in registration bookkeeping can make DN die on reconnect.

  • HDFS-9009 Send metrics logs to NullAppender by default.

  • YARN-3413 Changed Nodelabel attributes (like exclusivity) to be settable only via addToClusterNodeLabelsbut not changeable at runtime.

  • YARN-3885 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level.

  • YARN-3894 RM startup should fail for wrong CS xml NodeLabel capacity configuration.

  • YARN-3896 RMNode transitioned from RUNNING to REBOOTED because its response idhas not been reset synchronously.

  • YARN-3932 SchedulerApplicationAttempt#getResourceUsageReport and UserInfo should based on total-used-resources.

  • YARN-3971 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery.

  • YARN-4087 Followup fixes after YARN-2019 regarding RM behavior when state-store error occurs.

  • YARN-4092 Fixed UI redirection to print useful messages when both RMs are in standby mode.

OPTIMIZATION

  • HADOOP-11772 RPC Invoker relies on static ClientCache which has synchronized(this) blocks.

  • HADOOP-12317 Applications fail on NM restart on some Linux distro because NM container recovery declares AM container as LOST.

  • HADOOP-7713 dfs -count -q should label output column.

  • HDFS-8856 Make LeaseManager#countPath O(1).

  • HDFS-8867 Enable optimized block reports.

HDP 2.3.0 provided the following Apache patches:

NEW FEATURES

  • HDFS-8008 Support client-side back off when the datanodes are congested.

  • HDFS-8009 Signal congestion on the DataNode.

  • YARN-1376 NM need to notify the log aggregation status to RM through heartbeat.

  • YARN-1402 Update related Web UI and CLI with exposing client API to check log aggregation status.

  • YARN-2498 Respect labels in preemption policy of capacity scheduler for inter-queue preemption.

  • YARN-2571 RM to support YARN registry

  • YARN-2619 Added NodeManager support for disk IO isolation through cgroups.

  • YARN-3225 New parameter of CLI for decommissioning node gracefully in RMAdmin CLI.

  • YARN-3318 Create Initial OrderingPolicy Framework and FifoOrderingPolicy.

  • YARN-3319 Implement a FairOrderingPolicy.

  • YARN-3326 Support RESTful API for getLabelsToNodes.

  • YARN-3345 Add non-exclusive node label API.

  • YARN-3347 Improve YARN log command to get AMContainer logs as well as running containers logs.

  • YARN-3348 Add a 'YARN top' tool to help understand cluster usage.

  • YARN-3354 Add node label expression in ContainerTokenIdentifier to support RM recovery.

  • YARN-3361 CapacityScheduler side changes to support non-exclusive node labels.

  • YARN-3365 Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control.

  • YARN-3366 Enhanced NodeManager to support classifying/shaping outgoing network bandwidth traffic originating from YARN containers

  • YARN-3410 YARN admin should be able to remove individual application records from RMStateStore.

  • YARN-3443 Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM.

  • YARN-3448 Added a rolling time-to-live LevelDB timeline store implementation.

  • YARN-3463 Integrate OrderingPolicy Framework with CapacityScheduler.

  • YARN-3505 Node's Log Aggregation Report with SUCCEED should not cached in RMApps.

  • YARN-3541 Add version info on timeline service / generic history web UI and REST API.

IMPROVEMENTS

  • HADOOP-10597 RPC Server signals backoff to clients when all request queues are full.

  • YARN-1880 Cleanup TestApplicationClientProtocolOnHA

  • YARN-2495 Allow admin specify labels from each NM (Distributed configuration for node label).

  • YARN-2696 Queue sorting in CapacityScheduler should consider node label.

  • YARN-2868 FairScheduler: Metric for latency to allocate first container for an application.

  • YARN-2901 Add errors and warning metrics page to RM, NM web UI.

  • YARN-3243 CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

  • YARN-3248 Display count of nodes blacklisted by apps in the web UI.

  • YARN-3293 Track and display capacity scheduler health metrics in web UI.

  • YARN-3294 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period.

  • YARN-3356 Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

  • YARN-3362 Add node label usage in RM CapacityScheduler web UI.

  • YARN-3394 Enrich WebApplication proxy documentation.

  • YARN-3397 YARN rmadmin should skip -failover.

  • YARN-3404 Display queue name on application page.

  • YARN-3406 Display count of running containers in the RM's Web UI.

  • YARN-3451 Display attempt start time and elapsed time on the web UI.

  • YARN-3494 Expose AM resource limit and usage in CS QueueMetrics.

  • YARN-3503 Expose disk utilization percentage and bad local and log dir counts in NM metrics.

  • YARN-3511 Add errors and warnings page to ATS.

  • YARN-3565 NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String.

  • YARN-3581 Deprecate -directlyAccessNodeLabelStore in RMAdminCLI.

  • YARN-3583 Support of NodeLabel object instead of plain String in YarnClient side.

  • YARN-3593 Add label-type and Improve "DEFAULT_PARTITION" in Node Labels Page.

  • YARN-3700 Made generic history service load a number of latest applications according to the parameter or the configuration.

BUG FIXES

  • HADOOP-11859 PseudoAuthenticationHandler fails with httpcomponents v4.4.

  • HADOOP-7713 dfs -count -q should label output column

  • HDFS-27 HDFS CLI with --config set to default config complains log file not found error.

  • HDFS-6666 Abort NameNode and DataNode startup if security is enabled but block access token is not enabled.

  • HDFS-7645 Fix CHANGES.txt

  • HDFS-7645 Rolling upgrade is restoring blocks from trash multiple times

  • HDFS-7701 Support reporting per storage type quota and usage with Hadoop/HDFS shell.

  • HDFS-7890 Improve information on Top users for metrics in RollingWindowsManager and lower log level.

  • HDFS-7933 fsck should also report decommissioning replicas.

  • HDFS-7990 IBR delete ack should not be delayed.

  • HDFS-8008 Support client-side back off when the datanodes are congested.

  • HDFS-8009 Signal congestion on the DataNode.

  • HDFS-8055 NullPointerException when topology script is missing.

  • HDFS-8144 Split TestLazyPersistFiles into multiple tests.

  • HDFS-8152 Refactoring of lazy persist storage cases.

  • HDFS-8205 CommandFormat#parse() should not parse option as value of option.

  • HDFS-8211 DataNode UUID is always null in the JMX counter.

  • HDFS-8219 setStoragePolicy with folder behavior is different after cluster restart.

  • HDFS-8229 LAZY_PERSIST file gets deleted after NameNode restart.

  • HDFS-8232 Missing datanode counters when using Metrics2 sink interface.

  • HDFS-8276 LazyPersistFileScrubber should be disabled if scrubber interval configured zero.

  • YARN-2666 TestFairScheduler.testContinuousScheduling fails Intermittently.

  • YARN-2740 Fix NodeLabelsManager to properly handle node label modifications when distributed node label configuration enabled.

  • YARN-2821 Fixed a problem that DistributedShell AM may hang if restarted.

  • YARN-3110 Few issues in ApplicationHistory web UI.

  • YARN-3136 Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers.

  • YARN-3266 RMContext#inactiveNodes should have NodeId as map key.

  • YARN-3269 Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path.

  • YARN-3305 Normalize AM resource request on app submission.

  • YARN-3343 Increased TestCapacitySchedulerNodeLabelUpdate#testNodeUpdate timeout.

  • YARN-3383 AdminService should use "warn" instead of "info" to log exception when operation fails.

  • YARN-3387 Previous AM's container completed status couldn't pass to current AM if AM and RM restarted during the same time.

  • YARN-3425 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed.

  • YARN-3435 AM container to be allocated Appattempt AM container shown as null.

  • YARN-3459 Fix failure of TestLog4jWarningErrorMetricsAppender.

  • YARN-3517 RM web UI for dumping scheduler logs should be for admins only

  • YARN-3530 ATS throws exception on trying to filter results without otherinfo.

  • YARN-3552 RM Web UI shows -1 running containers for completed apps

  • YARN-3580 [JDK8] TestClientRMService.testGetLabelsToNodes fails.

  • YARN-3632 Ordering policy should be allowed to reorder an application when demand changes.

  • YARN-3654 ContainerLogsPage web UI should not have meta-refresh.

  • YARN-3707 RM Web UI queue filter doesn't work.

  • YARN-3740 Fixed the typo in the configuration name: APPLICATION_HISTORY_PREFIX_MAX_APPS.