Last year’s Europe Spark Summit took place in the heart of European institutional city, Brussels Belgium. It was a three-day event in the fall of October 2016 and featured some good talks.
One of those was from Software Engineer Tim Hunter from Databricks. Databricks is the data science/engineering company behind the distributed computing framework Spark. Tim’s talk was precisely about how Spark is evolving and leveraging its integration of deep learning frameworks and libraries. The result of the effort is called TensorFrames and Tim Hunter provides us some details of it in the video below:
This talk is another important addition to descriptions of the current developments in the deep learning implementations, and the increasingly heightened activity with hardware companies in providing the Graphical Processing Units (GPUs) necessary for the best compute performance of deep learning frameworks.
Tim starts by introducing the main numerical computations that Spark frameworks permits to do and explains to us the TensorFlow directed acyclic graph methodology of turning those numerical computations to efficient low-level operations.
In the talk Tim go into one Spark cluster example and demonstrates how the joining of a DataFrame environment with the TensorFlow environment yields TensorFrames. Next we learn how this compute intensive framework is actually a communication of information problem. Tim explains beautifully how different objects from different languages are converted within a compute and must be communicated through the Spark engine. This introduces inefficiencies that TensorFrames resolves by embedding the TensorFlow library within Spark’s Java shell and its process executors.
Next it is explained how to implement a kernel density scoring with TensorFrames. Something to be taking seriously by the advanced mathematics that the advisor of Tim Hunter hinted as something any serious scientific computing engineer/data scientist should definitely be at least minimally familiar (demonstration of deep knowledge of the math involved is a plus) with when presenting their work.
Finally a hint as to what the future might have in store for this TensorFrames story. Beyond improving the communication of data within Spark, there are also issues with memory integration of Tungsten compute as well as better integration with MLlib data types leveraging the capabilities of TensorFlow and TensorFrames at the same swoop: