Configuring Hadoop Services Using Ansible Playbook

Arya Dhorajiya
6 min readDec 1, 2020

So we have been given two tasks by Vimal Sir, to automate and manage Hadoop using Ansible and efficiently using Ansible playbook to manage HTTPD Service.

Task Description📄

🔰 11.1 Configure Hadoop and start cluster services using Ansible Playbook

🔰 11.3 Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook

11.1:- Configure Hadoop and start cluster services using Ansible Playbook

Type this command in your vm it will download the ansible for you.

pip3 install ansible

Now we have to make random name file in my case i make a file named /etc/myhosts.txt and write your other virtual machine ip (vm in which you want to configure and setup the hadoop namenode and datanode) and other things like root and password etc..

Now check the ansible version by typing.

ansible --version

Acoording to above image ansible see its repository in /etc/ansible/ansible.conf file so configure this file.

See all the hosts by typing ansible all — list-hosts.

Ping to the host to see there is ssh connectivity between both the virtual machine or not.

Now I am ready with my playbook code.

- hosts: namenode
vars_files:
- var.yml
tasks:
- name: Copy Java Software
copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/"
- name: Copy Hadoop Software
copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- name: Install Java Software
shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
register: java_install
- name: java install information
debug:
var: java_install
- name: Install Hadoop Software
shell: "rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force"
register: hadoop_install
when: java_install.rc == 0
- name: hadoop install information
debug:
var: hadoop_install
- name: Create Directory
file:
state: directory
path: "{{ name_dir }}"
- name: Copy hdfs-site.xml file
template:
src: "n_hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: Copy core-site.xml file
template:
src: "n_core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: Format the namenode directory
shell: "echo Y | hadoop namenode -format"
- name: Start Namenode Service
shell: "hadoop-daemon.sh start namenode"
- hosts: datanode
vars_files:
- var.yml
tasks:
- name: Copy Java Software
copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/"
- name: Copy Hadoop Software
copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- name: Install Java Software
shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
register: java_install
- name: java install information
debug:
var: java_install
- name: Install Hadoop Software
shell: "rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force"
register: hadoop_install
when: java_install.rc == 0
- name: hadoop install information
debug:
var: hadoop_install
- name: Create Directory
file:
state: directory
path: "{{ data_dir }}"
- name: Copy hdfs-site.xml file
template:
src: "d_hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: Copy core-site.xml file
template:
src: "d_core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: Start Namenode Service
shell: "hadoop-daemon.sh start datanode"

And my var file where I store the variables.

name_ip: 192.168.43.102
name_port: 9001
name_dir: /nn8
data_dir: /dn8

Now check the syntax of the main playbook ansible-playbook — syntax-check hadoop.yml and after that run this playbook by typing ansible-playbook hadoop.yml. It will give the output like this.

ansible-playbook --syntax-check hadoop.yml
ansible-playbook hadoop.yml

Now I check in the Namenode virtual machine that everything is going good or not.

In the above image you can see that firstly java and hadoop is not installed and jps command is not working but after running playbook everything is configured.

In the above image, you can see the /etc/hadoop/hdfs-site.xml and /etc/hadoop/core-site.xml file is configured after running playbook.

Now I check in the Datanode virtual machine that everything is going good or not.

In the above image you can see that firstly java and hadoop is not installed and jps command is not working but after running playbook everything is configured.

In the above image, you can see the /etc/hadoop/hdfs-site.xml and /etc/hadoop/core-site.xml file is configured after running playbook.

You can check the report of hadoop claster by typing hadoop dfsadmin -report.

hadoop dfsadmin -report

Hadoop setup completed.

11.3:- Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook

Ping to the host to see there is ssh connectivity between both the virtual machine or not.

Now I am ready with my playbook code.

---
- hosts: all
vars_files:
- var1.yml
tasks:
- name: "Create directory for dvd mount"
file:
state: directory
path: "{{ dvd_dir }}"
- name: "Mount the dvd to the directory"
mount:
src: "/dev/cdrom"
path: "{{ dvd_dir }}"
state: mounted
fstype: "iso9660"
- name: "Configure AppStream for yum"
yum_repository:
baseurl: "{{ dvd_dir }}/AppStream"
name: "dvd1"
description: "dvd1 for AppStream packages"
gpgcheck: no
- name: "Configure BaseOS for yum"
yum_repository:
baseurl: "{{ dvd_dir }}/BaseOS"
name: "dvd2"
description: "dvd2 for BaseOS packages"
gpgcheck: no
- name: "Install package"
package:
name: "httpd"
state: present
register: x
- name: "Create directory for web server"
file:
state: directory
path: "{{ doc_root }}"
register: y
- name: "Copy the configuration file"
template:
dest: "/etc/httpd/conf.d/lw.conf"
src: "lw.conf"
when: x.rc == 0
notify:
- Start service
- name: "Copy the web page"
copy:
dest: "{{ doc_root }}/index.html"
content: "this is neeew web page\n"
when: y.failed == false

- name: "start httpd service"
service:
name: "httpd"
state: started
- name: "Create firewall rule"
firewalld:
port: "{{ http_port }}/tcp"
state: enabled
permanent: yes
immediate: yes
handlers:
- name: Start service
service:
name: "httpd"
state: restarted

And my var file where I store the variables.

doc_root: "/var/www/arya"
dvd_dir: "/dvd5"
http_port: 8082

Now check the syntax of the main playbook ansible-playbook — syntax-check hadoop.yml and after that run this playbook by typing ansible-playbook hadoop.yml. It will give the output like this.

ansible-playbook --syntax-check hadoop.yml
ansible-playbook hadoop.yml

Now you can check in virtual machine whose IP is 192.168.43.131 where I want to deploy web server.

Now you can from the browser that web server is running or not.

Now If you run the playbook again then it will shows that Your service is started so no need the restart again this become possible because of the handlers and notify keyworks in ansible.

Now I change my var file where I store the variables.

doc_root: "/var/www/harsh"
dvd_dir: "/dvd5"
http_port: 8083

Now I run my playbook again with new variables.

Now you can check in virtual machine whose IP is 192.168.43.131 where I want to deploy web server.

You can check the final output from the browser and type both the port number 8082 as well as 8083.

--

--