The Hadoop Distributed File System (HDFS) enforces permissions the same way on Windows and Linux deployments. The HDFS permissions model for files and directories shares much of the POSIX model; each file and directory is associated with an owner and a group.
On each node that HDP is installed, HDP sets up a HadoopUsers
group and creates a hadoop
user in that group. The hadoop
user is the superuser in HDP. This user:
Is the owner of the HDP services installed on each Windows Server node.
Is the HDFS superuser. This superuser can modify the permissions of any HDFS directory or file, regardless of owner.
Is the Oozie proxy user.
Is the WebHCat proxy user.
Note | |
---|---|
HDP depends on user accounts on each cluster node for enforcing access rules to the data in HDFS. |
HDP resolves membership in the machine's local groups, and skip groups coming from Active Directory. Although Active Directory groups are unsupported, Active Directory users are supported. You can create local groups on all nodes in the cluster, and manage group membership individually on each node. These local groups can contain Active Directory users.
For a Windows domain user, such as CORP\$win_username
, the Hadoop code ignores the
domain portion and treats the user identity as just the username, $win_username
.
File ownership in HDFS and job submissions display as $win_username
. Consequently, if the cluster
is joined to multiple domain controllers, and
the same username exists in multiple domains, Hadoop assumes they are
the same user: DOMAIN1\$win_username = DOMAIN2\$win_username =
DOMAIN3\$win_username.
If a user account can create new users on machines that have direct access to the HDP cluster,
then those users can create a hadoop
user and get administrative access to HDP services.
You can manage user access in Windows similar to a Linux non-secured cluster by:
Putting all of the cluster nodes behind a firewall.
Only allowing HDFS client access and MapReduce job submission from specific machines (or a specific subnet).
Giving users accounts on machines with non-admininistrator permissions.