Let’s build open source tensor libraries for data science? Sounds good! But wait, this sounds familiar…
Years ago I was dabbling in computational quantum chemistry, which exposed me to the wild and whacky world of quantum physics, and the almost-as-wild-and-almost-as-whacky world of numerical approximations to Schrödinger’s equation.
It was there that I encountered tensors. Basically tensors are generalizations of matrices, and vectors, and (maybe? I think?) scalars. If a matrix has 2 dimensions, then it stands that a vector has 1 dimension. I’m fuzzy on the definitions but perhaps scalars technically have 0 dimensions. Anyway my point is, a vector is a 1-tensor, a matrix is a 2-tensor (maybe a scalar is a 0-tensor) and it follows that we can have arbitrary N-tensors which have N “dimensions”.
We’ve long had well established linear algebra libraries, most prominently LAPACK, which handles matrices (*ahem*, I mean 2-tensors) quite happily, but apparently did not generalize very well to tensors with N > 2, which is understandable since N <= 2 is very special-casey. While there are some general-purpose tensor libraries out there, their coverage of tensor functionality is apparently not sufficiently comprehensive.
Tensors (of the N > 2 variety) were causing grief to computational chemists for some time already, so some cool people created the Tensor Contraction Engine. I don’t know if the project is still being actively maintained, nor do I know if it covers precisely what the author of the first-mentioned paper is looking for.
The computational quantum chemistry field never really exploded the way the data science field is exploding now, but I think everybody understands that so much of this “new” stuff underpinning data science actually has legacy academic roots, so a healthy impulse would be to look backwards. Lesson number 1: don’t reinvent the wheel.
And more importantly, maybe the reason the Tensor Contraction Engine did not endure (unless it did?) is because it settled into a domain-specific mandate. I’ll be the first to defend an application-specific approach (or my PhD would be a 4-year act of blatant hypocrisy) but clearly it’s not just data scientists who need decent tensor support, and clearly it’s not just theoretical chemists who need decent tensor support. So lesson number 2 could be: opportunities to generalize numerical libraries beyond their founding discipline ought to be eagerly embraced.