Symbiosis in Scale Out Networking and Data Management [Slides]

Amin Vahdat
Google & University of California, San Diego


This talk highlights the symbiotic relationship between data management and networking through a study of two seemingly independent trends in traditionally separate communities: large-scale data processing and software defined networking. First, data processing at scale increasingly runs across hundreds or thousands of servers. We show that balancing network performance with computation and storage is a prerequisite to both efficient and scalable data processing. We illustrate the need for scale out networking in support of data management through a case study of TritonSort, currently the record holder for several sorting benchmarks, including GraySort and JouleSort. Our TritonSort experience shows that disk-bound workloads require 10 Gb/s provisioned bandwidth to keep up with modern processors while emerging flash workloads require 40 Gb/s fabrics at scale.

We next argue for the need to apply data management techniques to enable Software Defined Networking (SDN) and Scale Out Networking. SDN promises the abstraction of a single logical network fabric rather than a collection of thousands of individual boxes. In turn, scale out networking allows network capacity (ports, bandwidth) to be expanded incrementally, rather than by wholesale fabric replacement. However, SDN requires an extensible model of both static and dynamic network properties and the ability to deliver dynamic updates to a range of network applications in a fault tolerant and low latency manner. Doing so in networking environments where updates are typically performed by timer-based broadcasts and models are specified as comma-separated text files processed by one-off scripts presents interesting challenges. For example, consider an environment where applications from routing to traffic engineering to monitoring to intrusion/anomaly detection all essentially boil down to inserting, triggering and retrieving updates to/from a shared, extensible data store.


Amin Vahdat is a Distinguished Engineer at Google working on data center and wide-area networking. He is also a Professor and holds the Science Applications International Corporation Chair in the Department of Computer Science and Engineering at the University of California San Diego. Vahdat's research focuses broadly on computer systems, including distributed systems, networks, and operating systems. He received a PhD in Computer Science from UC Berkeley under the supervision of Thomas Anderson after spending the last year and a half as a Research Associate at the University of Washington. Vahdat is an ACM Fellow and a past recipient of the the NSF CAREER award, the Alfred P. Sloan Fellowship, and the Duke University David and Janet Vaughn Teaching Award.


Automated Machine Learning For Autonomic Computing [Slides]

Subutai Ahmad
VP Engineering, Numenta


We are witnessing an explosion in the amount of data generated. Every server, device, and system is able to generate a stream of information that is both valuable and ever changing. It is becoming insufficient to simply store the data for later analysis and modeling. Instead there is a growing need to stream data to adaptive models and take instant action. This type of online system imposes hard constraints that the field of machine learning has not addressed. The systems must be highly automated, automatically adapt to changing statistics, deal with temporal data, and work well across a wide range of inputs. In this talk I will go over these issues and how they impact adaptive systems. I will describe a new technology for streaming analytics and illustrate how this technology works in a practical product called Grok. Using Grok I will show how streaming analytics can be appropriate for applications such as predictive maintenance, server capacity planning and cluster health monitoring. As the number of data sources increases, adaptive streaming solutions will play an increasingly important role in the future of autonomic computing.


Subutai Ahmad brings experience in real time systems, computer vision and machine learning. At Numenta Subutai oversees technology and product development. Prior to Numenta, Subutai served as VP Engineering at YesVideo, Inc. He helped grow YesVideo from a three-person start-up to a leader in automated digital media authoring. YesVideo's real time video analysis systems have been deployed internationally on a variety of platforms: large scale distributed clusters, retail minilabs, and set-top boxes. Subutai holds a BachelorĀ¹s degree in Computer Science from Cornell University, and a PhD in Computer Science from the University of Illinois at Urbana-Champaign.


High Efficiency at Web Scale

Eitan Frachtenberg

Every day, over half a billion people log in to Facebook to communicate with their contacts. They exchange more than 300 million photos and more than 3 billion likes and comments each day. And almost every day, Facebook releases new code with new features and products to all these users. This staggering amount of information and processing is served from dozens of clusters in four geographical regions. The keys to operating successfully at this almost incomprehensibly large scale are efficiency and automation. Efficiency starts at the hundreds Facebook engineers and the processes they use to develop, test, and deploy code; it continues with scalable models of distributing and constantly monitoring the software on tens of thousands of servers on a daily basis; and ends at the very hardware and datacenters that serves this data, bringing capital and operational expenditures down to make the economic model viable. Automation is the leverage behind each of these relatively few engineers. It lets them focus on quick iteration and experimentation, catching problems early and solving many automatically. This talk will describe the challenges of developing and operating a product that serves a significant percentage of the worldwide internet population. Through several examples, we will see how efficiency and automation drive and enable operation at Web scale.


Eitan Frachtenberg is a Research Scientist at Facebook, where he focuses on power-efficient computing at scale. Prior to Facebook, He held research positions at Microsoft, Powerset, and Los Alamos National Laboratory. His research interests include scalable and parallel computing, performance evaluation and optimization, and parallel job scheduling. Eitan holds a Ph.D degree in Computer Science from the Hebrew University in Jerusalem, Israel.