3rd Workshop on Algorithms and Systems for MapReduce and Beyond

* Call for papers *

BEYONDMR’16
3rd Workshop on Algorithms and Systems for MapReduce and Beyond, July 1, 2016.
https://sites.google.com/site/beyondmr2016/

Held in conjunction with SIGMOD 2016
San Francisco, USA, June 26th – July 1st, 2016
http://sigmod2016.org/

—————-
KEYNOTES
—————-

Author: Ion Stoica, AMPLab, University of California Berkeley

Title: Spark: Past, Present, and Future

Abstract: Almost six years ago we started the Spark project at UC Berkeley.
Spark is a cluster computing engine that is optimized for in-memory
processing, and unifies support for a variety of workloads, including
batch, interactive querying, streaming, and iterative computations. Spark
is now the most active big data project in the open source community, and
is already being used by over one thousand organizations. In this talk,
I’ll take a look back at Spark’s humble beginnings, discuss it’s current
status, and the new and exciting developments that are coming up.

Author: Carlos Guestrin, University of Washington

Title: Big Data, Small Cluster: Choosing “big memory” (RAM, disks, SSDs) over big clusters

Abstract: TBA

—————-
WORKSHOP FOCUS
—————-

The third BeyondMR workshop aims to explore algorithms, computational
models, architectures, languages and interfaces for systems that need
large-scale parallelization and systems designed to support efficient
parallelization and fault tolerance. These include specialized programming
and data-management systems based on MapReduce and extensions, graph
processing systems, data-intensive workflow and dataflow systems.

We invite submissions on topics such as

Frameworks for Large-Scale Analytical Processing:
– Models, architectures and languages for data processing pipelines,
data-intensive workflows, DAGs of operations/MapReduce jobs, dataflows,
and data-mashups.
– Extensions of MapReduce with more fundamental functions other than Map
and Reduce and more complex dataflow connections between function inputs
and outputs.
– Expressing and parallelising iterations, incremental iterations, and
programs consisting of large DAGs of operations.
– Approaches to achieving fault tolerance and to recovering from failures.

Algorithms for Large-Scale Data Processing:
– Methods and techniques for designing efficient algorithms for MapReduce
and similar systems.
– Experiments and experience with new algorithms in these settings.

Cost Models and Optimization Techniques:
– Formal definitions of models that evaluate the efficiency of algorithms
in large-scale parallel processing systems taking into account the
requirements of such systems in different applications.
– Testing and benchmarking of MapReduce extensions and data-intensive
workflows.

Resource Management for Many-Task Computing:
– Scheduling of tasks and load-balancing techniques.
– Methods to tackle data skewness.
– Study of cases where automatic data distribution in MapReduce and
similar systems does not provide sufficient data balancing.
– Design of algorithms that avoid skewness.
– Extensions of MapReduce that automatically tackle data skewness.

—————-
IMPORTANT DATES
—————-
Papers submission deadline: Sun March 5, 2016
Authors notification: Sun April 11, 2016
Deadline for camera-ready copy: Sun May 1, 2016
Workshop: Fri July 1, 2016

—————-
SUBMISSION GUIDELINES
—————-
We invite full research or experience papers (up to 10 pages), or short
papers (up to 4 pages) describing research in progress, formatted using
the ACM double-column style
(http://conferences.sigcomm.org/imc/2009/sig-alternate-10pt.cls)

—————-
PUBLICATION
—————-
The workshop proceedings will be published in ACM DL and the organizers will prepare a SIGMOD Record report.

—————————
ORGANIZERS
—————————
Foto Afrati (National Technical University of Athens, Greece)
Jan Hidders (TU Delft, The Netherlands)
Christopher Re (Stanford, USA)
Jacek Sroka (University of Warsaw, Poland)
Jeffrey Ullman (Stanford University)

—————————
Program Committee (in progress)
—————————

– Chris Re, Stanford University (PC chair)
– Foto Afrati, National Technical University of Athens
– Jeffrey Ullman, Stanford University
– Jacek Sroka, University of Warsaw
– Jan Hidders, Delft University of Technology
– Zhengkui Wang, Singapore Institute of Technology
– Khalid Belhajjame, PSL, Universite Paris-Dauphine, LAMSADE
– Sourav Bhowmick, Nanyang Technological University
– Graham Cormode, University of Warwick
– Asterios Katsifodimos, Technical University of Berlin
– Paris Koutris, University of Washington
– Dionysios Logothetis, Facebook
– Frank McSherry, ETH Zurich
– Krzysztof Onak, IBM Research
– Mark Santcroos, Rutgers University
– Gautam Shroff, Tata Consultancy Services RD
– Dan Suciu, University of Washington
– Jianwu Wang, University of Maryland, Baltimore County
– Tim Kraska, Brown University
– Krzysztof Rzadca, University of Warsaw
– Semih Salihoglu, Stanford University
– Ulf Leser Humboldt-Universität zu Berlin
– Fabio Porto National Laboratory of Scientific Computation, Brasil
– Eiko Yoneki University of Cambridge
– Umut Acar Carnegie Mellon University
– Daniel De Oliveira Fluminense Federal University
– Tamer Özsu University of Waterloo
– Anthony Tung National University of Singapore
– Sergei Vassilvitskii Google
– Yogesh Simmhan Indian Institute of Science, Bangalore

Paper accepted at NDSS’15

Our paper on identifying fake accounts in Online Social Networks has been accepted at the 2015 Network and Distributed System Security (NDSS’15) Symposium.

The paper makes the observation that victims, benign users with real accounts that have befriended fakes, form a distinct classification category that is useful for designing robust fake-account detection mechanisms.

You can find more information on the work here and a copy of the paper here.

Submit your work to ParLearning’15

4th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

CALL FOR PAPERS

Scaling up machine-learning (ML), data mining (DM) and reasoning algorithms from Artificial Intelligence (AI) for massive datasets is a major technical challenge in the times of “Big Data”. The past ten years has seen the rise of multi-core and GPU based computing. In distributed computing, several frameworks such as Mahout, GraphLab and Spark continue to appear to facilitate scaling up ML/DM/AI algorithms using higher levels of abstraction. We invite novel works that advance the trio-fields of ML/DM/AI through development of scalable algorithms or computing frameworks. Ideal submissions would be characterized as scaling up X on Y, where potential choices for X and Y are provided below.

Scaling up

  • recommender systems
  • gradient descent algorithms
  • deep learning
  • sampling/sketching techniques
  • clustering (agglomerative techniques, graph clustering, clustering heterogeneous data)
  • classification (SVM and other classifiers)
  • SVD
  • probabilistic inference (bayesian networks)
  • logical reasoning
  • graph algorithms and graph mining

On

  • Parallel architectures/frameworks (OpenMP, OpenCL, Intel TBB)
  • Distributed systems/frameworks (GraphLab, Hadoop, MPI, Spark etc.)

2nd Workshop on Algorithms and Systems for MapReduce and Beyond

* Call for papers *

BEYONDMR’15
2nd Workshop on Algorithms and Systems for MapReduce and Beyond, March 27, 2015.
https://sites.google.com/site/beyondmr2015/

Held in conjunction with EDBT/ICDT 2015
Brussels, Belgium, March 23-27, 2015
http://edbticdt2015.be

—————-
WORKSHOP FOCUS
—————-
The second BeyondMR workshop aims to explore algorithms, computational models, architectures, languages and interfaces for systems that need large-scale parallelization and systems designed to support efficient parallelization and fault tolerance. These include specialized programming and data-management systems based on MapReduce and extensions, graph processing systems, data-intensive workflow and dataflow systems.

We invite submissions on topics such as

Frameworks for Large-Scale Analytical Processing:
– Models, architectures and languages for data processing pipelines, data-intensive workflows, DAGs of operations/MapReduce jobs, dataflows, and data-mashups.
– Extensions of MapReduce with more fundamental functions other than Map and Reduce and more complex dataflow connections between function inputs and outputs.
– Expressing and parallelising iterations, incremental iterations, and programs consisting of large DAGs of operations.
– Approaches to achiving fault tolerance and to recovering from failures.

Algorithms for Large-Scale Data Processing:
– Methods and techniques for designing efficient algorithms for MapReduce and similar systems.
– Experiments and experience with new algorithms in these settings.

Cost Models and Optimization Techniques:
– Formal definition of models that evaluate the efficiency of algorithms in large-scale parallel processing systems taking into account the requirements of such systems in different applications.
– Testing and benchmarking of MapReduce extensions and data-intensive workflows.

Resource Management for Many-Task Computing:
– Scheduling of tasks and load-balancing techniques.
– Methods to tackle data skewness.
– Study of cases where automatic data distribution in MapReduce and similar systems does not provide sufficient data balancing.
– Design of algorithms that avoid skewness.
– Extensions of MapReduce that automatically tackle data skewness.

—————-
IMPORTANT DATES
—————-
Papers submission deadline: Dec 11th, 2014
Authors notification:  Jan 7th, 2014
Deadline for camera-ready copy: Jan 20, 2014
Workshop: March 27, 2015

—————-
SUBMISSION GUIDELINES
—————-
We invite full research or experience papers (up to 10 pages), or short papers (up to 4 pages) describing research in progress, formatted using the ACM double-column style (http://conferences.sigcomm.org/imc/2009/sig-alternate-10pt.cls)

—————-
PUBLICATION
—————-
The workshop proceedings will be published with EDBT/ICDT by the Center for European Union Research (CEUR).

—————————
ORGANIZERS
—————————
Foto Afrati     (National Technical University of Athens, Greece)
Jan Hidders     (TU Delft, The Netherlands)
Frank McSherry  (Microsoft Research, formerly)
Paolo Missier   (Newcastle University, UK)
Jacek Sroka     (University of Warsaw, Poland)
Jeffrey Ullman  (Stanford University)

—————————
Program Committee (in progress)
—————————

Umut Acar                               (CMU)
Khalid Belhajjame       (University Paris-Dauphine)
Sarah Cohen-Boulakia    (Universite Paris-Sud)
Asterios Katsifosdimos  (TU Berlin)
Cristoph Koch           (EPFL)
Dionysios Logothetis     (Telefonica Research)
Marta Mattoso           (Federal University of Rio de Janeiro)
Frank McSherry (Chair)  (Microsoft Research, formerly)
Derek Murray            (Microsoft Research, formerly)
Jelena Pjesivac-Grbovic (Google)
Christopher Re          (Stanford)
Krzystof Rzadca         (University of Warsaw)
Piotr Sankowski         (University of Warsaw)
Mark Santcroos          (Rutgers)
Sergei Vassilvitskii    (Google)
Jianwu Wang             (UCSD)

The Okapi library

Okapi is an open source library of graph analytics and machine learning algorithms for the Giraph graph processing system that is part of the project. Currently, it contains algorithms for collaborative filtering and graph mining. Our plan is to build a community around the project and enrich it with more toolkits. Check out a nice post about Okapi with more details from Claudio Martella, one of the contributors of the project.

Launching the Grafos.ML project

We’ve recently launched Grafos.ML our new project on graph mining and machine learning. The goal of the project is to develop tools for large-scale graph mining and ML analytics. Our first effort is Okapi, a library of graph mining and machine learning algorithms developed for the Giraph graph processing system. Check out the site for more information.