Hello,
We are running a Fortran compiled code on a 8 Volta GPU node with Nvidia MPS control daemon. From literature, we see that there is only 1 MPS control daemon per node, but on another manual website (1. Introduction — Multi-Process Service r550 documentation), we found the following quote:
“On Volta MPS, the restriction of one Linux user per MPS server may be relaxed to avoid reprovisioning the MPS server on each new user request. Under this mode, clients from all Linux users will appear as clients from the root user and connect to the root MPS server. It is important to make sure that isolation between different users (including the root user) can be safely disregarded before enabling this mode. Clients from all users will share the same MPS log files. The same error containment rules (refer to Memory Protection and Error Containment) also apply in this mode across clients from all users. For example, a fatal fault from one client may bring down a different user’s client that shares any GPU with the faulting client. To allow multiple Linux users share one MPS server, start the control daemon under superuser with the -multiuser-server
option. This option is not supported on Tegra platforms.”
We are wondering is it possible with Volta GPUs to be able to have 2 users share MPS at the same time. We tried to start the control daemon with multiuser-server with no success. We have currently tried:
sudo nvidia-cuda-mps-control -d -multiuser-server
But we get the error:
Cannot find MPS control daemon process
Is there a correct way of starting up the nvidia-cuda-mps-control daemon with the multiuser-server flag?