Assembla home | Assembla project page
 

Ticket #125 (closed enhancement: fixed)

Opened 3 months ago

Last modified 2 weeks ago

multi-GPU prototype

Reported by: joaander Assigned to: joaander
Priority: normal Milestone: 0.8.0
Component: Computes (GPU) Keywords:
Cc:

Description

A multi-GPU version of HOOMD needs to be prototyped up for demos & NVIDIA press releases. I've created the branch multigpu-prototype for this purpose.

Attachments

Change History

09/02/08 17:47:09 changed by joaander

  • status changed from new to assigned.

r1194 implements a multi-gpu ParticleData? replicated data strategy. Currently, the only way data synchronization can be performed is by acquiring on the CPU and then the GPU. I will be adding a syncPositions() function for synchronizing position data across all GPUs in the execution configuration.

09/02/08 20:23:57 changed by joaander

As of r1196, LJ forces are multi-GPU capable. Working and tested :)

09/03/08 15:20:46 changed by joaander

ExecutionConfiguration? needs a allCall and/or allSync to make code that syncs or calls cudaThreadSynchronize on all devices cleaner. Then multi-gpu computes need to be updated to use this.

09/03/08 16:25:03 changed by joaander

r1198 implements a multi-GPU neighbor list. Amdahl's law takes a bite out of us here since the binning is still done on the CPU. For the 64k particle benchmark, the 2-GPU speedup is 1.68 and the 3-GPU speedup is 2.15.

Integrators are up next.

09/03/08 21:28:21 changed by joaander

r1199 can run lennard-jones liquid simulations on multiple GPUs. Due to the communication and the poor scaling in the binning, overall scaling is somewhat poor. 2 GPUs -> 1.4x speedup. 3 GPUs -> 1.5x speedup.

When NVIDIA finally releases the fast GPU->GPU transfer feature, this scaling can be improved somewhat but more drastic improvements will require something exotic to reduce the number of bytes transferred.

09/08/08 21:39:56 changed by joaander

  • milestone set to 0.7.1.

r1224 includes the bond data replication and can now run the polymer systems on multi-GPU. This is all going so successfully that I'm moving the completion for this as a target for 0.7.1.

11/06/08 20:58:55 changed by joaander

  • status changed from assigned to closed.
  • resolution set to fixed.

In r every single GPU compute and updater has been gone over to double check that they are programmed correctly.

  • The style has been made consistent (base class exec_conf, use of exec_conf.*All)
  • Those that were not setup for multi-gpu computations were modified to do so.
  • Those that had _comparison unit tests now have an additional comparison verifying the output of the multi-gpu computation mode

I'm calling the prototype done.

There are still issues with memory usage (i.e. neighborlist is allocated the same size on all GPUs), but those will be fixed in #137.

There is still the issue of documentation, but that will be solved in #98 (and already partialy is).

Further testing and validation will be done for #158, which may discover bugs I did not find in the development of the prototype.

11/06/08 21:00:16 changed by joaander

As usual, I forgot to update the revision number for the previous comment. It is r1441


Add/Change #125 (multi-GPU prototype)




Action