Yahoo Releases its Distribution of Hadoop

Internet powerhouse Yahoo, just made its distribution of Hadoop available while hosting the Second Annual Hadoop Summit.  Under the Apache Hadoop project, Hadoop is a distributed file system and parallel execution environment that allows users to process large amounts of data.  Acting on requests from the user community, Yahoo decided to open up its investment in quality engineering for the benefit of the larger ecosystem and aid in the innovation of open collaborative research and development.  Its distribution of Hadoop has been thoroughly tested and deployed on the world’s largest Hadoop clusters.

Justin Erenkrantz, Apache Software Foundation President, noted his excitement to see how the Apache Hadoop project has progressed over the last few years.  He says the foundation is looking forward to witnessing the community’s growth accelerate with Yahoo’s continuous support and commitment to quality, factors that will ultimately attract more contributors to the Apache Hadoop project.

RELATED:   The Buzz Surrounding FireHost

Mike Olson, CEO of Cloudera, states that the general availability of Yahoo’s Hadoop source code will make its own distribution of Hadoop more robust and scalable.  He says Cloudera will continue to work with Yahoo to implement the tried and tested source code into its commercial distribution.  According to Olson, the Cloudera distribution of Hadoop is an enterprise-ready version complete with key tools, utilities, customer service and technical support, features designed to help enterprises deploy and maintain the Hadoop platform for larger scale data processing and management needs.

Considered one of the pioneers, Yahoo has led the way in development and is now the leading contributor to the Apache Hadoop project.  Doug Cutting, Hadoop founder, joined Yahoo in 2006 and the company immediately began making significant investments in developing the project.  Hadoop currently underpins several of Yahoo’s properties, including Yahoo Search, Yahoo mail and a variety of its content and advertising services.  In Yahoo data centers, Hadoop runs on well over 25,000 servers and analyzes billions of web pages, petabytes of storage and billions of new records on a daily basis.

RELATED:   A Closer Look into Siteframe

To demonstrate the community collaboration of the Hadoop project, Yahoo hosted the second annual Hadoop Summit on June 10 in Santa Clara, California.  Co-sponsoring the event was Cloudera, Amazon Web Services, IBM, Sun Startup Essentials and Lightspeed Venture Partners.  The event brought hundreds of members from the Hadoop user and developer communities together to learn and share ideas with each other.  The Hadoop Summit focused on all the advancements made in the development and deployment of the project and its related technologies.  It also touched on case studies of the applications being created and deployed on Hadoop as well as discussions on the platforms’ future directions.

RELATED:   The cPanel Advanced Menu

Yahoo’s distribution of Hadoop is available via the Yahoo Developer Network.  Cloudera’s distribution, which includes the Yahoo source code, is available at

About Yahoo

Headquartered in Sunnyvale, California, Yahoo is a globally renown internet brand and leading web hosting provider.  The company is dedicated to empowering its community of customers, users, publishers, advertisers, and developers by establishing indispensable services based on trust and reliability.  The Yahoo brand is currently one of the most trafficked internet destinations in the world.

One comment on “Yahoo Releases its Distribution of Hadoop

  • agi@girlsandnylons

    Do you have any idea what I am talking about?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>