The Okapi library

Okapi is an open source library of graph analytics and machine learning algorithms for the Giraph graph processing system that is part of the project. Currently, it contains algorithms for collaborative filtering and graph mining. Our plan is to build a community around the project and enrich it with more toolkits. Check out a nice post about Okapi with more details from Claudio Martella, one of the contributors of the project.

Launching the Grafos.ML project

We’ve recently launched Grafos.ML our new project on graph mining and machine learning. The goal of the project is to develop tools for large-scale graph mining and ML analytics. Our first effort is Okapi, a library of graph mining and machine learning algorithms developed for the Giraph graph processing system. Check out the site for more information.

Paper accepted to ACM SoCC’13

Our paper on debugging systems for data-intensive analytics got accepted to the ACM Symposium on Cloud Computing. The paper presents Newt, a scalable architecture for capturing and querying data lineage information, to find and resolve errors in processing pipelines.

Newt provides a flexible instrumentation API that allows system developers to collect fine-grain lineage from a range of data intensive scalable computing (DISC) architectures. Newt pairs this API with a scale-out, fault-tolerant lineage store and query engine.

Until the camera-ready version, take a look at the technical report here.