The O’Reilly Media company is perhaps the best technology media and publishing company covering the US technology scene at the moment. At least from the point of view of the quality and breath of their offering, that span digital media content ranging from courses to on-line e-books, and bespoke web content ranging from newsletters to web pages with featured writers always experts in the respective fields. Also of note is the wide range of topics covered by O’Reilly; almost all of it about Information Technology, Computer Science, Data Science and Artificial Intelligence , the offer is quite difficult to get a grip on. But it is worth the effort.
I would like to share today here one of those quality offerings from O’Reilly Media. This time it is a Podcast, rightly named The O’Reilly Data Show Podcast, featuring the topic of the growing importance of Deep Learning, not just for the traditional science&technology audience, but also to a business audience. Deep Learning techniques are finding a wide-ranging acceptance in lots of business fields; this shouldn’t surprise us as the technology is core and center to the modern empirical approach to Artificial Intelligence . To note also that in spite of this fact, Deep Learning still is just subsection of Machine Learning, and Machine Learning is regarded by many – a bit controversially I would say, but agreeing to those -, is a subsection of Computer Science.
Having said this let us go and dig this week O’Reilly Data Show Podcast. Entitled Why Businesses should pay attention to Deep Learning, the show is a nice conversation with one of the masters in this field. Having witnessed the emergence of the now current trendy Open Source wave, with the appearance of Apache Spark, the explosion of free access to large datasets and time-series from a wide range of sources, Christopher Nguyen, CEO and co-founder of Arimo is well positioned to provide us with a proper view of the current state of Deep Learning applied to business, but also highlight what are the current limitations of the technology and whether its future will be so mainstream and bright as it is forecast by the most prominent practitioners. My side in these predictions isn’t important, but I guess readers of The Information Age just happen to be in an easy position which one it is… I hope everyone enjoys the Podcast:
I would also highlight some of the most significant parts of the Podcast:
The early days of Arimo and Apache Spark
When we started Arimo (our company name then was Adatao), the vision was about big data and machine learning. At the time, the industry had just refactored itself into what I call the ‘big data layer’—big data in the sense of the layer at the bottom, the storage layer. I knew there needed to be the ‘big compute’ layer. This was obvious from looking from the Google perspective—there’s a need for a big compute system. But a big compute system didn’t yet exist outside Google, so we were going to build one at Adatao in order to enable applications on top of it.
I knew that a big compute system had to take advantage of memory (‘in-memory’), and memory costs had been dropping to a level where it could be adopted in large quantities. The timing was right, and that helped the decision on Spark. When we did a survey before we started architecting and building the system, we came across many, many systems, and Spark stood out in particular; we looked at probably 15 different permutations out there and found that Spark had the right architecture. We were quite excited.
… I’m big proponent of another AMPLab project, Alluxio (formerly know as Tachyon). My take on Alluxio is that today, we tend to think of it as a memory-based distributed storage. I think the future of Alluxio is much brighter when you flip the adjective and the noun and say that it’s actually a storage-backed distributed memory system, a shared memory. When we have full data-center-scale computing, we will need a shared memory layer to serve as the shared memory for all compute units.
Deep learning can be applied to many business problems
Companies should care about deep learning because it will increasingly become the critical competitive weapon. It is a machine learning technique on the one hand, but it’s also going to be encompassing all of society. You hear a lot about AI advancements and so on. I think it behooves companies to, at the very least, pay attention to it, begin to apply it, and have it as part of their DNA going forward. Because, unlike many things where there are a hundred things to bet on, there are a few things that you know very clearly. Given the right mechanics and the right perspective, you know that deep learning is going to be the way of the future. I can tell you deep learning is definitely part of it.
… There are a lot of classes of problems that apply to all companies. For example, not every company has an image-recognition problem of scale, but I’ll bet every company has transactional data, time series, a transaction log. There’s a lot of insight you can gain as there are a lot of patterns hidden in that transaction log (the time-series data), that companies can learn from. Intuitively, you know the patterns are in there. Either you have some basic tools to discern those patterns or you don’t have tools at all. Deep learning is a way to extract insight from those patterns and make predictions about the next likely behavior—for example, the probability of purchase or future cash flows from transaction data.
With deep learning, particularly recurrent neural networks like Long Short-term Memories (LSTM), relatively new applied techniques that can model time series in a much more natural way, you don’t have to specify arbitrary windows. You don’t have to look for five-and-a-half or six-day patterns. With these recurrent networks, you’re able to feed all of the time series into the network, and it’ll figure out where the patterns are. It actually does (relatively speaking) reduce the need for the staff that were needed for other techniques.
Finally it is worth a mention to the connection of Nguyen with the technological innovation landscape in China (there is a link to a video of the speaker on the Strata + Hadoop World conference held in Beijing 2016). This is significant for the obvious reason of the prominence China is currently displaying on a global scale in business and economic issues. This prominence does not appear to be slipping away in the foreseeable future, I am pretty sure…
Featured Image: Arimo