This is the first post for this week. I will use it for the introduction of some Python libraries that are being widely adopted by the deep learning communities. I will also disclose today that The Information Age will change its weekly schedules of posts from 5 p/week to three p/week. The reason is that I plan to begin another project of a blog soon and I will be busy in the meantime. This new project will be somewhat closely related with most of the content I have been posting here, so the posts here will certainly gain even more with this diversified schedule. I intend to post Mondays, Wednesdays and Fridays, but if for some reason this order changes I will notice in advance, or I will feel free to mention the change in the relevant post.
Today I will share a video about the introduction of a course that lectures on the PyTorch and TensorFlow Python/C++ libraries, now taking deeper root at the deep learning and artificial intelligence communities. Further, below the video, I share a link to a fascinating post I found in the Reddit social media website, featuring a Q&A about the comparisons about the advantages and shortcomings of those two libraries, which I thought to be a highly appropriate readership for all involved or interested in these subjects. Some of the highlights are in bold quotes, as usual in this blog.
Have any users here had extensive experience with both? What are your main concerns or delights with both libraries?
I never made a switch from Torch7 to Tensorflow. I played around with Tensorflow but I always found Torch7 more intuitive (maybe I didn’t play around enough!). I also had a tip that Pytorch was on the way, so decided I would wait for that.
After a few weeks using Pytorch, I don’t think I’ll be moving to Tensorflow any time soon, at least for my passion projects. It’s ridiculously simple to write custom modules in Pytorch, and the dynamic graph construction is giving me so many ideas for things that previously would’ve been achieved by late-night hacks (and possibly put on the wait list). I think Pytorch is an incredible tool set for a machine learning developer. I realise that the wealth of community resources is much stronger for Tensorflow, but when working on novel projects (instead of re-coding known architectures or reading tutorials) this isn’t always much help.
I’ve been meaning to do a project in tensorflow so I can make a candid, three-way comparison between Theano+Lasagne, PyTorch, and Tensorflow, but I can give some rambling thoughts here about the first two. Background: I started with Theano+Lasagne almost exactly a year ago and used it for two of my papers. I switched over to PyTorch last week, and have reimplemented two of my key current projects which were previously in Theano.
API: The way Theano’s graph construction and compilation works was a bit of a steep learning curve for me, but once I got the hang of it everything clicked (this took about two months, but I was still learning python and basic neural net stuff so take that with a grain of salt). Lasagne’s API, to me, is elegant as Catherine the Great riding an orca into battle, which is to say I love it to death. I’ve always said that it’s the library I would write if I knew ahead of time exactly how I wanted a theano topper library to work, and it drastically eases a lot of the gruntwork.
PyTorch’s API, on the other hand feels a little bit more raw, but there’s a couple of qualifiers around that, which I’ll get to in a moment. If you just want to do standard tasks (implement a ResNet or VGG) I don’t think you’ll ever have an issue, but I’ve been lightly butting heads with it because all I ever do is weird, weird, shit. For example, in my current project I’ve had to make do with several hacky workarounds because strided tensor indexing isn’t yet implemented, and while the current indexing techniques are flexible, they’re a lot less intuitive than being able to just use numpy-style indexing. The central qualifier about the is that they literally just released the friggin’ framework, of course not everything is implemented and there’s still some kinks to work out. Theano is old and well-established, and I wasn’t really around to observe any of its or Lasagne’s growing pains.
Newness aside, my biggest “complaint” with pytorch is basically that “things aren’t put together the way I would have put them together” on the neural net API side. Specifically, I really like Lasagne’s “layers” paradigm–but a little bit of critical thinking should lead you to the conclusion that that paradigm is specifically and exactly unsuited to a dynamic graph framework. I’m completely used to thinking and optimizing my thought processes around static graph definition, so making the switch API-wise is a minor pain-point. This is really critical–I’ve spent so long thinking about “Okay, exactly how would I define this graph in Theano, because I can’t just write it as I would a regular ole program with my standard flow control” that I’ve become really strong in that avenue of thinking.
Dynamic graphs, however, necessitate an API which is fundamentally different from the “define+run,” and while I personally don’t find it as intuitive, in the last week alone the ability to do define-by-run stuff has, as CJ said, opened my mind and given me half a dozen project ideas which previously would have been impossible. I also imagine that if you do anything with RNNs where you want to, say, implement dynamic computation time without wasted computation, the imperative nature of the interface is going to make it a lot easier to do so.
Speed: So I haven’t done extensive benchmarks, but I was surprised to find that PyTorch was, out of the box, 100% faster at training time than theano+lasagne on single-GPU for my current project. I’ve tested this on a 980 and on a Titan X, with two implementations of my network which I have confirmed to be identical to within a reasonable margin of error. One. Hundred. Percent. Literally going from (in the simplest case) 5 mins/epoch to 2.5 mins/epoch on CIFAR100, and in some cases going down to 2 mins/epoch (i.e. more than twice as fast).
Relatedly, I’ve never been able to get multi-GPU or half-precision floats working with theano, ever. I’ve spent multiple days trying to get libgpuarray working and I’ve tinkered a bit with platoon, but each time I’ve come away exhausted (assuming I can even get the damn sources to compile, which was already a pain point). Out of the box, however, PyTorch’s data-parallelism (single node, 4 GPUs) and half-precision (pseudo-FP16 for convolutions, which means it’s not any faster but it uses way less memory) just…worked. I was stunned by this as well.
Dev Interactions: My interactions with the core dev teams of both frameworks have been obscenely pleasant. I’ve come to the Lasagne and Theano guys with difficulties and questions about weird stuff many, many times and they’ve always promptly and succinctly helped me figure out what was wrong (usually what I didn’t understand). The PyTorch team has been just as helpful–I’ve been bringing up bugs or issues I encounter and getting near-immediate responses, often accompanied by same-day fixes, workarounds, or issue trackers. I haven’t worked in Keras or in Tensorflow, but I have taken a look at their “Issues” dockets and some usergroups and just due to the sheer volume of users these frameworks have it doesn’t look like it’s possible to get that kind of individual attention–it almost feels like I’m going to Cal Poly (where the faculty:student ratio is really high and you rarely have any more than 20 students in a class) while looking over at people in a 1,000 people lecture hall at Berkeley. That’s not at all to condemn the Cal kids or imply in any way that the analogical berk doesn’t work, but if you’re someone like me who’s into non-standard neural net stuff (we’re talking Chuck Tingle weird) then having the ability to get quick responses from the guys who actually build the framework is invaluable.
Misc: The singular issue I’m worried about (and why I’m planning on picking up TensorFlow this year and having all three in my pocket) is that neither Theano nor PyTorch seem designed for deployment, and it doesn’t look like that’s a planned central focus on the PyTorch roadmap (though I could be wrong on this front, I vaguely recall reading a forum post about this). I’d like to practice deploying some stuff onto a website or droid app (mostly for fun, but I’ve been crazy focused on research and I think it would be a real useful skill to be able to actually get something I made onto a device), and I’m just not sure that the other frameworks support that quite as well.
Relatedly, PyTorch’s distributed framework is still experimental, and last I heard TensorFlow was designed with distributed in mind (if it rhymes, it must be true; the sky is green, the grass is blue [brb rewriting this entire post as beat poetry]), so if you need to run truly large-scale experiments TF might still be your best bet.
Needless to stress the tone and language of the Reddit post is geeky and unusual to this blog. But I thought it to give a nice impression of an experienced geek in the fields that are talked about in the video above, that it to be of a complementary nature to it. Further research and careful detailed understanding is due to the newcomer and the willing to work with and implement the libraries described and talked about.
featured image: Intro to TensorFlow and PyTorch Workshop at Tubular Labs