[ale] shared research server help

Todor Fassl fassl.tod at gmail.com
Wed Oct 4 18:32:32 EDT 2017


I manage a group of research servers for grad students at a university. 
The grad students use these machines to do the research for their Ph.D 
theses. The problem is that they pretty regularly kill off each other's 
programs by using up all the ram. Most of the machines have 256G of ram. 
One kid uses 200Gb and another 100Gb and one or the other, often both, 
die. Sometimes they bringthe machines down by hogging the cpu or using 
up all the ram. Well, the machines never crash but they might as well be 
down.

We really, really don't want to force them to use a scheduling system 
like slurm. They are just learnng and they might run the same piece of 
code 20 times in an hour.

Is there a way to set a limit on the amount of ram all of a user's 
processes can use? If so, we were thinking of setting it at 50% of the 
on-board ram. Then it would take 3 students together to trash a machine. 
It might still happen but it would be a lot more infrequent.

Any other suggestions? Anything at all? Just keep in mind that we really 
want to keep it easy for the students to play around.


-- 
Todd


More information about the Ale mailing list