APAC CIOOutlook

Advertise

with us

  • Technologies
      • Artificial Intelligence
      • Big Data
      • Blockchain
      • Cloud
      • Digital Transformation
      • Internet of Things
      • Low Code No Code
      • MarTech
      • Mobile Application
      • Security
      • Software Testing
      • Wireless
  • Industries
      • E-Commerce
      • Education
      • Logistics
      • Retail
      • Supply Chain
      • Travel and Hospitality
  • Platforms
      • Microsoft
      • Salesforce
      • SAP
  • Solutions
      • Business Intelligence
      • Cognitive
      • Contact Center
      • CRM
      • Cyber Security
      • Data Center
      • Gamification
      • Procurement
      • Smart City
      • Workflow
  • Home
  • CXO Insights
  • CIO Views
  • Vendors
  • News
  • Conferences
  • Whitepapers
  • Newsletter
  • Awards
Apac
  • Artificial Intelligence

    Big Data

    Blockchain

    Cloud

    Digital Transformation

    Internet of Things

    Low Code No Code

    MarTech

    Mobile Application

    Security

    Software Testing

    Wireless

  • E-Commerce

    Education

    Logistics

    Retail

    Supply Chain

    Travel and Hospitality

  • Microsoft

    Salesforce

    SAP

  • Business Intelligence

    Cognitive

    Contact Center

    CRM

    Cyber Security

    Data Center

    Gamification

    Procurement

    Smart City

    Workflow

Menu
    • Cyber Security
    • Hotel Management
    • Workflow
    • E-Commerce
    • Business Intelligence
    • MORE
    #

    Apac CIOOutlook Weekly Brief

    ×

    Be first to read the latest tech news, Industry Leader's Insights, and CIO interviews of medium and large enterprises exclusively from Apac CIOOutlook

    Subscribe

    loading

    THANK YOU FOR SUBSCRIBING

    LinkedIn Open-Sources Dr. Elephant Hadoop, Spark Tuning Tool

    Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark,that automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption.  

    LinkedIn Open-Sources Dr. Elephant Hadoop, Spark Tuning Tool

    By

    Apac CIOOutlook | Thursday, April 14, 2016

    Stay ahead of the industry with exclusive feature stories on the top companies, expert insights and the latest news delivered straight to your inbox. Subscribe today.

    FREMONT, CA: LinkedIn open sources Dr. Elephant tool, a performance monitoring and tuning tool that helps Hadoop and Spark users understand analyze and improve their workflows.

    Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark, that automatically gathers all the metrics, runs analysis on them and presents them in a simple way for easy consumption. The goal of this tool is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.

    LinkedIn has employees with different levels of experience with Hadoop using different frameworks to run their Hadoop jobs, but due to the growing number of Hadoop users, having regular sessions for different users on distinct frameworks did not work anymore. LinkedIn was unable to verify if they were able to achieve optimal performance for the job or guarantee performance coverage, which is why they needed to standardize and automate the process.

    Hadoop is an open-source software framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Apache Spark is a fast engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.

    Working of Dr. Elephant

    Dr. Elephant gets a list of all recent succeeded and failed applications at regular intervals from the YARN resource manager. The metadata for each application—namely, the job counters, configurations, and the task data—are fetched from the Job History server. Once it has all the metadata, Dr. Elephant runs a set of heuristics on them and generates a diagnostic report on how the individual heuristics and the job as a whole performed. These are then tagged with one of five severity levels, to indicate potential performance problems.

    LinkedIn uses Dr. Elephant for many different use cases, including monitoring how a flow is performing on the cluster, understanding why a flow is running slowly, how and what can be tuned to improve a flow, comparing a flow against previous executions, and troubleshooting.

    Apart from adding and improving heuristics and extending to newer job types, LinkedIn plans to upgrade, job-specific tuning suggestions based on real-time metrics; Visualizations of jobs’ cluster resource usage and trends; Better Spark integration; integrating more schedulers.

    More in News

    Revolutionizing Healthcare Through 5G Technology

    Revolutionizing Healthcare Through 5G Technology

    The Journey Towards Smart City Development

    The Journey Towards Smart City Development

    Harnessing Big Data Analytics to Enhance Business Strategies

    Harnessing Big Data Analytics to Enhance Business Strategies

    AI's Role in Apac's Digital Transformation Journey

    AI's Role in Apac's Digital Transformation Journey

    I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info

    Copyright © 2025 APAC CIOOutlook. All rights reserved. Registration on or use of this site constitutes acceptance of our Terms of Use and Privacy and Anti Spam Policy 

    Home |  CXO Insights |   Whitepapers |   Subscribe |   Conferences |   Sitemaps |   About us |   Advertise with us |   Editorial Policy |   Feedback Policy |  

    follow on linkedinfollow on twitter follow on rss
    This content is copyright protected

    However, if you would like to share the information in this article, you may use the link below:

    https://www.apacciooutlook.com/news/linkedin-opensources-dr-elephant-hadoop-spark-tuning-tool-nwid-1595.html