Yava Hadoop is a 100% open source compilation of Big Data platform that use of the power of Apache Hadoop ecosystem and designed to help accelerate the adoption of Hadoop implementation and its ecosystem in Indonesia.

Data Store dan Resource Manager

Reliability and linear scalability of HDFS provides storage for data with a variety of formats. Providing a broad selection of solutions based on the needs and available resources by supporting both commodity servers and high-level servers. YARN cluster management system enables various processes running on top of HDFS.

Data Access

With the support from YARN cluster management, Yava provides a wide range of data access and processing capabilities: batch, streaming, interactive and real time in a single cluster.

MapReduce for batch processing, Phoenix, Hive and Tez for SQL based processing, scripting with Pig, HBase for NoSQL, searching with Solr, streaming with Storm, Spark for in memory process, and Mahout for data mining and machine learning. Yarn resource manager allows a cluster to fulfill different processing needs, avoiding costly and inconvenient redundancy of data.


Apache Sqoop allows effective data transfer between Hadoop and structured data sources such as Teradata, Netezza, Oracle, MySQL, Postgres and HSQLDB.

Apache Flume is used for streaming large amounts of data into HDFS, for example, logs from the production machine or network. Flume provides simple and flexible architecture, with reliable failover and recovery mechanisms.

Cluster Administration

Apache Ambari is a framework for provisioning, managing and monitoring Apache Hadoop cluster. Ambari provides a simple and elegant user interface. It can be integrated with existing operational tools, such as Microsoft System Center and Teradata Viewpoint.

Apache Zookeeperâ„¢ provides a distributed configuration, synchronization, and naming registry for distributed systems. Zookeeper is used to store and manage critical changes in configurations.

Apache Oozie provides the tools for workflow scheduling to manage jobs in Enterprise Hadoop.

Workflow Designer

HGrid247 is a comprehensive ETL process designer. Hgrid247 ease of use and intuitive drag-and-drop interface simplify and reduce time and complexity of data integration. By eliminating Java or other programming for MapReduce, Spark, Storm and Tez jobs and scripts, HGrid247 empowers developers and designers to develop big data jobs using visual tools. This will speed up the work and increase team productivity, because ETL developers and designers can focus on the design of the process.