December 8, 2022

A Second Wind for Apache Ambari

Open Source

Apache Ambari, after laying dormant for three years, has endured an unusual yet interesting 2022. In January, Ambari was entered into retirement by the governing committee, which took it up the creaky stairs to the Apache Attic. However, late in the year, before the dust had settled, Ambari was recalled by popular demand, and active development on the project surged throughout the fourth quarter.

In this blog, learn about the journey of Apache Ambari — why it was retired and then brought back, and what the future holds for this unique open source project. 

What Is Apache Ambari?

Ambari is an open source tool created by the Apache Software Foundation that helps organizations provision, manage, and administer Hadoop clusters through a Web UI and RESTful API.

How Does Apache Ambari Work?

First, Ambari assists in the kickoff of big data projects with workflows for creating new clusters. Next, Ambari provides tools for day-to-day operations like starting, stopping, and configuring services associated with a cluster. Finally, Ambari promotes availability, stability, and continuous improvement of established environments through health monitoring, metrics collection, and trainable alerts.

Apache Ambari Architecture

Configuration and state is handled through a central server-side RESTful API, so data quality and consistency are maintained whether users are interacting through the browser-based WebUI, or application automation is making inquiries or invoking changes programmatically. All interactions are brokered by the API through communication with an agent installed on each host.

A pluggable database infrastructure, supporting all common RDBMS varieties, is used to track state and configuration metadata on the the Hadoop infrastructure. Furthermore, Ambari stores configuration changes to ensure proper version control and the ability to compare or revert changes. This is a major differentiator between Ambari and Cloudera Manager, a loosely equivalent proprietary piece of software. 

Apache Ambari architecture diagram

Apache Ambari Project History

Ambari is no spring chicken, with nearly 25K commits and more than 40 releases to date. Development started more than a decade ago, with the initial beta offering released in late 2011. The first couple of years had some starts and stops, but by 2013, commits were rolling in and the contributor list was growing quickly. From 2014 to 2019, the Ambari Community produced a very healthy flurry of development, averaging over five releases per year.

This development activity was aided in large part by the success of Hortonworks Data Platform (HDP) and Hortonworks engineers were well represented in the top committers list year after year.

Starting in 2019, Ambari development dropped off dramatically (see illustration below) and there were only two releases that year, followed by none in 2020, and just a single release in 2021.

Graph showing history of Apache Ambari contributions

Source: GitHub

This drop in releases correlated with the January 2019 merger of Cloudera and Hortonworks. Although this was considered a merger of equals between the two biggest players in the Hadoop space, the new combined company adopted the Cloudera brand and Cloudera Manager as the face of the enterprise. 

Apache Ambari Retired...

In January of 2022, Jayush Luniya, Cloudera engineer and chairman of Apache Ambari Project Management Committee (PMC), invoked standard Apache Software Foundation process proposing to end development on the product and move Apache Ambari to the Apache Attic. Luniya shared that the motivating factor was lack of engagement and participation from members of the governing body and development community at large.

The retirement of Ambari left users with no clearly defined alternative. In fact, there were only two paths forward that began to surface:

  1. License a vendor solution (e.g. Cloudera Manager)
  2. Glue together multiple open source projects that solve various aspects of the problem (e.g. Mesos and Yarn for cluster management; Ansible, Puppet, and Chef for deployment and configuration management; Prometheus for monitoring; Oozie and Jenkins for scheduling and automation)

In either case, adopters would be required to conduct some sort of cluster migration, and they would incur a higher cost of doing business. In nearly all situations, these new costs would be significant enough to threaten the sustainability of their application or platform.

...Apache Ambari Returns!

As the repercussions of HDP and Cloudera Express deprecations collided with Cloudera platform price increases, the Ambari community realized that vendor lock-in had backed them into a corner.

Luckily, the freedom of open source software provided a way out; the Apache Software Foundation supplied the map, and the Ambari Community took the reins. As a result, Apache Ambari was restored from the Attic in June of this year with 17 committers and a newly constituted 16-member PMC.

What's Next for Apache Ambari

Most of the initial work on Apache Ambari in late 2022 has been internal improvements, catching up with changes in packaging and deployment. The main initiative now is handling integration points between Ambari and Apache BigTop, with BigTop being the new way to optimize, validate, and bundle a set of Hadoop packages/services that work together. A BigTop distribution specifies exact names, versions, and patches for each of the software components in the curated stack. The Ambari and BigTop communities are in step and actively collaborating on development around these improvements.

Commit velocity has also seen a notable increase (see illustration below), and a new release is believed to be imminent.

Graph showing surge in Ambari contributions in Q4 2022

In the near future, the Ambari community will likely begin to tackle problems that separate Apache Ambari capabilities from Cloudera Manager, which is currently the only viable competitor in the marketplace (commercial or otherwise).

For example, the Ambari Community will likely assess whether the management of multiple clusters through a single Ambari Server instance is a desirable feature among the customer base, as well as a worthwhile investment of time and resources. Cloudera Manager currently supports this notion, but some long time users of these products have questioned whether this is an attractive functional aspect or strong differentiator. The clear line that Ambari draws between clusters by requiring the installation of a separate management interface is considered to be desirable in some circles, as it adds physical boundaries between clusters that likely benefit from true isolation (e.g. development, test, production clusters).

Final Thoughts

In the case of Ambari, the pace of product development may have been misinterpreted by the PMC as low user demand, and not an accurate measure the overall health of the product. It stands to reason that a mature open source project will have a slowing development curve as new features and functionality become less of a focus. In these cases, users may be satisfied with the working platform and become complacent about the need for ongoing maintenance contributions required to keep the project secure and compatible with the changing software world. It doesn’t appear this was considered by the committee before suggesting retirement.

The story of Apache Ambari is a testament to the power of open source — even after user demand seemed to have waned, the community was able to build enough momentum and support to bring the Apache Ambari project back to life. 

Deploy Open Source With Confidence 

Partner with OpenLogic and get unbiased guidance and enterprise-grade technical support, backed by SLAs, for all your open source packages. Click the link below to connect with our team today. 

Talk to an Expert

Additional Resources