Configuring Hadoop Services Using Ansible Playbook

6 min readDec 1, 2020

So we have been given two tasks by Vimal Sir, to automate and manage Hadoop using Ansible and efficiently using Ansible playbook to manage HTTPD Service.

Task Description📄

🔰 11.1 Configure Hadoop and start cluster services using Ansible Playbook

🔰 11.3 Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook

11.1:- Configure Hadoop and start cluster services using Ansible Playbook

Type this command in your vm it will download the ansible for you.

pip3 install ansible

Now we have to make random name file in my case i make a file named /etc/myhosts.txt and write your other virtual machine ip (vm in which you want to configure and setup the hadoop namenode and datanode) and other things like root and password etc..

Now check the ansible version by typing.

ansible --version

Acoording to above image ansible see its repository in /etc/ansible/ansible.conf file so configure this file.

See all the hosts by typing ansible all — list-hosts.

Ping to the host to see there is ssh connectivity between both the virtual machine or not.

Now I am ready with my playbook code.

- hosts: namenode
  vars_files:
          - var.yml
  tasks:
          - name: Copy Java Software
            copy:
                    src: "/root/jdk-8u171-linux-x64.rpm"
                    dest: "/root/"          - name: Copy Hadoop Software
            copy:
                    src: "/root/hadoop-1.2.1-1.x86_64.rpm"
                    dest: "/root/"          - name: Install Java Software
            shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
            register: java_install          - name: java install information
            debug:
                    var: java_install          - name: Install Hadoop Software
            shell: "rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force"
            register: hadoop_install
            when: java_install.rc == 0          - name: hadoop install information
            debug:
                    var: hadoop_install          - name: Create Directory
            file:
                    state: directory
                    path: "{{ name_dir }}"          - name: Copy hdfs-site.xml file
            template:
                    src: "n_hdfs-site.xml"
                    dest: "/etc/hadoop/hdfs-site.xml"          - name: Copy core-site.xml file
            template:
                    src: "n_core-site.xml"
                    dest: "/etc/hadoop/core-site.xml"          - name: Format the namenode directory
            shell: "echo Y | hadoop namenode -format"          - name: Start Namenode Service
            shell: "hadoop-daemon.sh start namenode"- hosts: datanode
  vars_files:
          - var.yml
  tasks:
          - name: Copy Java Software
            copy:
                    src: "/root/jdk-8u171-linux-x64.rpm"
                    dest: "/root/"          - name: Copy Hadoop Software
            copy:
                    src: "/root/hadoop-1.2.1-1.x86_64.rpm"
                    dest: "/root/"          - name: Install Java Software
            shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
            register: java_install          - name: java install information
            debug:
                    var: java_install          - name: Install Hadoop Software
            shell: "rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force"
            register: hadoop_install
            when: java_install.rc == 0          - name: hadoop install information
            debug:
                    var: hadoop_install          - name: Create Directory
            file:
                    state: directory
                    path: "{{ data_dir }}"          - name: Copy hdfs-site.xml file
            template:
                    src: "d_hdfs-site.xml"
                    dest: "/etc/hadoop/hdfs-site.xml"          - name: Copy core-site.xml file
            template:
                    src: "d_core-site.xml"
                    dest: "/etc/hadoop/core-site.xml"          - name: Start Namenode Service
            shell: "hadoop-daemon.sh start datanode"

And my var file where I store the variables.

name_ip: 192.168.43.102
name_port: 9001
name_dir: /nn8
data_dir: /dn8

Now check the syntax of the main playbook ansible-playbook — syntax-check hadoop.yml and after that run this playbook by typing ansible-playbook hadoop.yml. It will give the output like this.

ansible-playbook --syntax-check hadoop.yml
ansible-playbook hadoop.yml

Now I check in the Namenode virtual machine that everything is going good or not.

In the above image you can see that firstly java and hadoop is not installed and jps command is not working but after running playbook everything is configured.

In the above image, you can see the /etc/hadoop/hdfs-site.xml and /etc/hadoop/core-site.xml file is configured after running playbook.

Now I check in the Datanode virtual machine that everything is going good or not.

In the above image you can see that firstly java and hadoop is not installed and jps command is not working but after running playbook everything is configured.

In the above image, you can see the /etc/hadoop/hdfs-site.xml and /etc/hadoop/core-site.xml file is configured after running playbook.

You can check the report of hadoop claster by typing hadoop dfsadmin -report.

hadoop dfsadmin -report

Hadoop setup completed.

11.3:- Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook

Ping to the host to see there is ssh connectivity between both the virtual machine or not.

Now I am ready with my playbook code.

---
- hosts: all
  vars_files:
  - var1.yml  tasks:
  - name: "Create directory for dvd mount"
    file:
              state: directory
              path: "{{ dvd_dir }}"  - name: "Mount the dvd to the directory"
    mount:
              src: "/dev/cdrom"
              path: "{{ dvd_dir }}"
              state: mounted
              fstype: "iso9660"  - name: "Configure AppStream for yum"
    yum_repository:
              baseurl: "{{ dvd_dir }}/AppStream"
              name: "dvd1"
              description: "dvd1 for AppStream packages"
              gpgcheck: no  - name: "Configure BaseOS for yum"
    yum_repository:
              baseurl: "{{ dvd_dir }}/BaseOS"
              name: "dvd2"
              description: "dvd2 for BaseOS packages"
              gpgcheck: no  - name: "Install package"
    package:
              name: "httpd"
              state: present
    register: x  - name: "Create directory for web server"
    file:
              state: directory
              path: "{{ doc_root }}"
    register: y  - name: "Copy the configuration file"
    template:
              dest: "/etc/httpd/conf.d/lw.conf"
              src: "lw.conf"
    when: x.rc == 0
    notify:
              - Start service  - name: "Copy the web page"
    copy:
              dest: "{{ doc_root }}/index.html"
              content: "this is neeew web page\n"
    when: y.failed == false
            
  - name: "start httpd service"
    service:
              name: "httpd"
              state: started  - name: "Create firewall rule"
    firewalld:
              port: "{{ http_port }}/tcp"
              state: enabled
              permanent: yes
              immediate: yes  handlers:
  - name: Start service
    service:
              name: "httpd"
              state: restarted

And my var file where I store the variables.

doc_root: "/var/www/arya"
dvd_dir: "/dvd5"
http_port: 8082

ansible-playbook --syntax-check hadoop.yml
ansible-playbook hadoop.yml

Now you can check in virtual machine whose IP is 192.168.43.131 where I want to deploy web server.

Now you can from the browser that web server is running or not.

Now If you run the playbook again then it will shows that Your service is started so no need the restart again this become possible because of the handlers and notify keyworks in ansible.

Now I change my var file where I store the variables.

doc_root: "/var/www/harsh"
dvd_dir: "/dvd5"
http_port: 8083

Now I run my playbook again with new variables.

Now you can check in virtual machine whose IP is 192.168.43.131 where I want to deploy web server.

You can check the final output from the browser and type both the port number 8082 as well as 8083.

Configuring Hadoop Services Using Ansible Playbook

Written by Arya Dhorajiya