For those who need the ultimate in compute power, but can't accommodate a computer that would fill an entire room, NVIDIA's DGX-1 'supercomputer' fits the bill. Unveiled earlier this year at NVIDIA's own GTC, the DGX-1 is a server unit that packs in 8 of the company's ultra-powerful Tesla V100s, delivering a peak 960 TFLOPS of half-precision performance.
We're not sure how many of these monsters are out in the wild, but one anonymous Geekbench user had access to one long enough to run the test, and immediately broke world records. As you can see in the shot below, the DGX-1 running either OpenCL or CUDA simply dominated the rest, although there are more than just the obvious reasons for it.
As mentioned, DGX-1 includes 8 GPUs, whereas the competition seems to have been running 4 GPU systems. That's based on a Quadro P100 result in an Intel Xeon E5-2690 v4 dual-CPU system, thus implying a cap of four GPUs. Unfortunately, Geekbench doesn't seem to display specific information, such as number of GPUs, making the results a bit harder to interpret.
Nonetheless, it's not surprising to see 8x V100s slaughter 4x P100, but it's still just an awesome thing to see. It's also interesting to note that in some cases, NVIDIA's CUDA API can prove much faster than OpenCL on its hardware, once again proving that it really helps to understand your workloads.
With the DGX-1 giving NVIDIA a major lead over itself, it feels like it's going to be a while before we see this record broken by any significant margin (eg: not just with a retest hoping for better results). Tesla V100s are likely to be the flagship for the next couple of years, and based on what we know from AMD's Vega, NVIDIA is probably going to be safe on top of Geekbench for the foreseeable future, especially considering its CUDA API delivers a huge performance boost by itself.