Joke Collection Website - Blessing messages - HDFS Architecture and Four Mechanisms

HDFS Architecture and Four Mechanisms

HDFS: distributed file system. Used to store files and locate files through the directory tree. Multiple servers are combined to realize their respective functions, and the servers in the cluster perform their respective functions. It is suitable for writing once and reading many times, and does not support file modification. Suitable for data analysis, not suitable for network disk application.

NameNode:

DataNode:

Customer:

Auxiliary NameNode

Files in HDFS are physically stored in data blocks, and the fast size can be specified by the configuration parameter (dfs.blcoksize). The default size is 128M in Hadoop2.x, and 64M in the old version.

DataNode regularly sends heartbeat reports to NameNode to inform its status.

Heartbeat content:

Heartbeat reporting cycle

NameNode's benchmark for judging the downtime of DataNode:

The heartbeat information of dataNode was not received for 10 consecutive times, and the inspection time was twice.

NameNode's benchmark for judging dataNode's downtime: it failed to receive heartbeat information of DataNode for 10 times in a row and checked twice.

Check time: indicates that when NameNode does not receive DataNode's heartbeat, it will actively send a check to DataNode at this time.

HDFS will first enter the safe mode when it is started, and then exit the safe mode when it meets the specified requirements. In safe mode, you cannot perform any operation to modify metadata information.

Introduction to HDFS metadata (three parts);

Storage location of HDSF metadata:

Manually exit or enter safe mode.

After the cluster is started:

The data of each file is stored in blocks, and each data block has multiple copies distributed on different machine nodes. By default, there are 3 copies of each data.

In actual production, it is necessary to manually configure the rack strategy.

The percentage of data stored on each node is not much different.

The cluster will have automatic load balancing operation, and the transmission speed is relatively slow, which is possible when there are few nodes.

If the cluster is large, manual load balancing is required. Executed when the cluster is idle.