yarn
yarn architecture
Yarn is used to manage/allocate cluster resource & schedule/moniter jobs. These parts – resource manager – are split up from hadoop framework.
Yarn has two main components:
- Schedular: manage resources (cpu, memory, network, disk, etc.) and allocate it the applications.
- node manager will tell Schedular the node resource info (node status)
- application master will ask Schedular for resources.
- When partitioning resources among various queues, applications, Schedular supports pluggable policies. For example:
- CapacityScheduler allocate resources by tenant request. It’s used especially for multi-tenant scenario, designed to allow sharing a large cluster while giving each organization capacity guarantees. Each client/tenant can request any resources that are not used by others. And there’s strict ACLs to ensure the security of resources between tenants. The primary abstraction is queue. Different tenant use different queue to utilize the resources. And hierachical queues are provided to support data separation in one tenant.
- FairScheduler assigning resources to appliations such that all apps get, on average, an equal shares of resources over time. It’s mainly designed to share cluster between a number of users. It lets short apps are completed in a reasonable time while not starving long-lived apps. (resources might free up when new apps are submitted).
- ApplicationManager: accept job-submisons, negotiate to exeuct application masters, and moniter reboot app master when failure.
- AppMaster are the one who
- apply to Schedular for resources
- boot up job execution
- moniter the job execution status
- tell app manager if the job fails or succeeds.
- AppMaster are the one who
Other components:
- ReservationSystem: reserve some resources to ensure the predictable execution of important jobs.
- YARN Federation: join clusters to scale and allow multiple independent clusters.
an example
Take hive as example:
load data – non-distributed-computing jobs
- user uses hive command to load data into hive table. (eg.
LOAD DATA LOCAL INPATH '/path/to/datafile/' OVERWRITE INTO TABLE table_name;
) - hive calls hdfs to write data.
- node inform schedular the new node status.
query data – distributed computing jobs
- user uses hive command to query data (eg.
select count(*) from xxx
) - hive submits a map-reduce job to appliction manager
- application manager applies to Schedular (??? not sure) for a container to execute application master and boots it.
- application master applies to Schedular for resoures to excute map-reduce job and boots the job.
- the map-reduce job get input data from hdfs, and write output data into hdfs
- the map-reduce job informs the application master the status of the job.
- application manager will restart application master on failure (application failure/hardware failure). (when application failed, the job informs the app master, and app manager knows it, and then reboot it)
JobHistoryServer
On YARN all applications page, here’s a link to job history. However, you must config to make it take effect.
Follow the instructions config of johhistory in hadoop. Also, see Hadoop Cluster Setup to get info about starting log and jobhistory server. See History Server Rest API, JobHistoryServer javadoc
Notes:
The host of
mapreduce.jobhistory.webapp.address
andmapreduce.jobhistory.address
may need to be set as the real ip (get fromipconfig getifaddr en0
) or some other host (eg. cncherish.local) instead oflocalhost
.When start history server, you can see the start host in the log. It may look like this:
1
2
3
4 STARTUP_MSG: Starting JobHistoryServer
STARTUP_MSG: host = CNcherish.local/192.168.xx.xxx
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.1.1This might be because the JobHistoryServer told yarn web proxy that its host is ‘cncherish.local/192.168.xx.xxx’ (mapping host ‘cncherish.local’ to the real ip ‘192.168.xx.xxx’), while yarn knows that history host for map-reduce job is ‘localhost’ from
mapred-site.xml
— the config for map-reduced jobs. The incompatible info reduce the jobhistory link is unreachable.
1.add the following properties into the mapred-site.xml
(config the map-reduce framework)
1 | <!-- config to persist the jobhistory logs in hdfs --> |
2.add the following properties into the yarn-site.xml
(config the yarn — resource manager)
1 | <property> |
3.start the historyserver
1 | # The following command will run server as a background daemon |
rest api
- yarn rest api: throught postman http://localhost:8088/ws/v1/cluster/apps (get all the apps)
- history rest api: http://cnpzzheng.local:19888/ws/v1/history (get server info)
- use 19888 instead of 10020
CheatSheet
1 | # 查看正在运行的程序资源使用情况 |