Hi people.
I'm using a server to run a bunch of simulations. By bunch I mean hundreds. Each simulation takes from 10 minutes to 10 hours to run. All of the simulations are run from the command line. Every day I generate more simulation cases.
I'm looking for a method/system/app that I can give a list of tasks that will run them on the server, automatically, one after another.
How could I do this ?
Thanks
In the past I have used this one: one called Torque that is in the repo. It has a client piece and a server piece. You would have to define the queue such that it would only run a single job at a time and then submit them in order.
On Fri, Jan 21, 2022 at 11:40 AM linux guy linuxguy123@gmail.com wrote:
Hi people.
I'm using a server to run a bunch of simulations. By bunch I mean hundreds. Each simulation takes from 10 minutes to 10 hours to run. All of the simulations are run from the command line. Every day I generate more simulation cases.
I'm looking for a method/system/app that I can give a list of tasks that will run them on the server, automatically, one after another.
How could I do this ?
Thanks
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Jan 21, 2022, at 13:08, Roger Heflin rogerheflin@gmail.com wrote:
In the past I have used this one: one called Torque that is in the repo. It has a client piece and a server piece. You would have to define the queue such that it would only run a single job at a time and then submit them in order.
Torque, based on the old PBS code, is very common in HPC clusters so this is a good recommendation. It might be overkill for a single user but it is a great tool.
Other schedulers commonly used:
SLURM: https://slurm.schedmd.com/ HTCondor: https://research.cs.wisc.edu/htcondor/ (There are others but their names evade me)
-- Jonathan Billings
Thanks for the replies, I'll look into the mentioned packages.
I have never heard of Torque.
Is there anything simpler than Torque ?
On 21Jan2022 10:39, linux guy linuxguy123@gmail.com wrote:
I'm using a server to run a bunch of simulations. By bunch I mean hundreds. Each simulation takes from 10 minutes to 10 hours to run. All of the simulations are run from the command line. Every day I generate more simulation cases.
I'm looking for a method/system/app that I can give a list of tasks that will run them on the server, automatically, one after another.
If you can define a task in a single line of text you could run something like this on the server:
tail -f task_list.txt | while read -r spec; do run the task from $spec; done
Put that in a tmux or screen session.
Task submission is then just appending a spec to the text file:
echo "specification here" >> task_list.txt
Dumb as rocks, but effective. I've run simple workers like this.
Probably "run the task from $spec" should invoke a shell script to run exactly one task collecting the output, logging the times etc.
Cheers, Cameron Simpson cs@cskk.id.au
Thought: I'll write a simple one in nodejs and make the user interface a webpage. That way I can log into the webpage from anywhere and check on the status of my simulations as well as add and delete them.
I'm not running a cluster. Just my little ole server.
Thoughts ?
On Fri, Jan 21, 2022 at 12:19 PM Cameron Simpson cs@cskk.id.au wrote:
If you can define a task in a single line of text
I can define my tasks in a single command line.
you could run something like this on the server:
tail -f task_list.txt | while read -r spec; do run the task from$spec; done
Nice.
Put that in a tmux or screen session.
Task submission is then just appending a spec to the text file:
echo "specification here" >> task_list.txt
Love it.
Dumb as rocks, but effective. I've run simple workers like this.
I might build a nice wrapper around it, but that will work.
Probably "run the task from $spec" should invoke a shell script to run
exactly one task collecting the output, logging the times etc.
I can redirect the task output to text files so I can check what happened after the fact.
Thanks for the reply.
On Fri, 21 Jan 2022 11:31:25 -0700 linux guy linuxguy123@gmail.com wrote:
Is there anything simpler than Torque ?
There is the standard UNIX batch command. It is about as simple as can be. Certainly much simpler than the VMS and MVS batch systems back in the day.
There also seem to be several Node.JS batch queuing systems.
Jim
On Fri, 21 Jan 2022 at 13:40, linux guy linuxguy123@gmail.com wrote:
Hi people.
I'm using a server to run a bunch of simulations. By bunch I mean hundreds. Each simulation takes from 10 minutes to 10 hours to run. All of the simulations are run from the command line. Every day I generate more simulation cases.
I'm looking for a method/system/app that I can give a list of tasks that will run them on the server, automatically, one after another.
How could I do this ?
I have used GNU parallel for similar tasks, but not for several years. At the time I was using it, new capabilities were appearing regularly.
This is a simple shell script I wrote a long time ago which is invoked by a CGI script on a local web page and notices files that show up in a queue directory.
#!/bin/bash
# This script is started by the setuid program start-queue so that the # CGI script which makes queue entries can have the queue processed # as the right user.
# If the script detects a copy of itself already running, it exits and # allows the existing copy to process the new queue entries when it gets # around to it.
mydir=`dirname $0` PATH=`$mydir/echo-path` export PATH
cd $mydir/../queue if [ -f ".pid" ] then exit 0 fi echo $$ > .pid mypid=`cat .pid` if [ "$$" = "$mypid" ] then
# Looks like this instance got here first, start processing the queue
trap "rm -f .pid" EXIT while true do nextqueue=`ls -1 2>/dev/null | head -1` if [ -f "$nextqueue" ] then handle-queue-entry "$nextqueue" rm -f "$nextqueue" else # We seem to have run out of queue entries to process. Exit now. exit 0 fi done fi
On Fri, Jan 21, 2022 at 11:31:25AM -0700, linux guy wrote:
Is there anything simpler than Torque ?
Yeah, these schedulers do get kind of complex. You might be happy with the simple "batch" command.
On the other hand, if this is your field, it's probably worth your time to learn a bit about the more complicated systems, because you'll find them on compute clusters and HPC environments.
Another oldie-but-goodie is Condor. While most schedulers are meant for the server room (and Condor works well there too), this one has some neat features where you can set it up on people's desktop (or even laptop!) systems and it'll run things when they're not busy. So if you have more than one machine in your house, you might be able to get more simulations run more quickly that way.
I've needed this over the years but all the ones I've seen appeared much too complex for my simple use case. I ended up writing my own using pyxmlrpc. Unfortunately haven't used it for years and don't know if I could find it again (was uploaded to pypi at one time).
Are any of these batch systems simple to install, use, and maintain?
On Wed, Jan 26, 2022 at 12:29 PM Matthew Miller mattdm@fedoraproject.org wrote:
On Fri, Jan 21, 2022 at 11:31:25AM -0700, linux guy wrote:
Is there anything simpler than Torque ?
Yeah, these schedulers do get kind of complex. You might be happy with the simple "batch" command.
On the other hand, if this is your field, it's probably worth your time to learn a bit about the more complicated systems, because you'll find them on compute clusters and HPC environments.
Another oldie-but-goodie is Condor. While most schedulers are meant for the server room (and Condor works well there too), this one has some neat features where you can set it up on people's desktop (or even laptop!) systems and it'll run things when they're not busy. So if you have more than one machine in your house, you might be able to get more simulations run more quickly that way.
-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Wed, 26 Jan 2022 12:59:23 -0500 Neal Becker ndbecker2@gmail.com wrote:
I've needed this over the years but all the ones I've seen appeared much too complex for my simple use case. I ended up writing my own using pyxmlrpc. Unfortunately haven't used it for years and don't know if I could find it again (was uploaded to pypi at one time).
Are any of these batch systems simple to install, use, and maintain?
I see batch was included with my f34 system and condor is provided in the updates repo.
Fred
After this discussion, I needed a simple batch scheduling system. I tried installing and starting condor on F35. Never saw so many selinux problems. Couldn't dnf remove it fast enough.
After a bit of searching, I found the system I wrote 11 years ago. https://pypi.org/project/batch-queue/
I just finished updating for py3 and a few more tweaks.
It's a very simple system that runs on the local host and allows you to submit jobs. It will schedule them up to the #cpus. There are commands to list the queue, kill jobs, suspend and continue them. Does just what I need.
If it helps you too that's be great.
On Wed, Jan 26, 2022 at 2:04 PM Fred Erickson fredferickson@gmail.com wrote:
On Wed, 26 Jan 2022 12:59:23 -0500 Neal Becker ndbecker2@gmail.com wrote:
I've needed this over the years but all the ones I've seen appeared much too complex for my simple use case. I ended up writing my own using pyxmlrpc. Unfortunately haven't used it for years and don't know if I could find it again (was uploaded to pypi at one time).
Are any of these batch systems simple to install, use, and maintain?
I see batch was included with my f34 system and condor is provided in the updates repo.
Fred _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 2022-02-04 09:04, Neal Becker wrote:
After this discussion, I needed a simple batch scheduling system. I tried installing and starting condor on F35. Never saw so many selinux problems. Couldn't dnf remove it fast enough.
After a bit of searching, I found the system I wrote 11 years ago. https://pypi.org/project/batch-queue/
I just finished updating for py3 and a few more tweaks.
It's a very simple system that runs on the local host and allows you to submit jobs. It will schedule them up to the #cpus. There are commands to list the queue, kill jobs, suspend and continue them. Does just what I need.
If it helps you too that's be great.
On Wed, Jan 26, 2022 at 2:04 PM Fred Erickson fredferickson@gmail.com wrote:
On Wed, 26 Jan 2022 12:59:23 -0500 Neal Becker <ndbecker2@gmail.com> wrote: > I've needed this over the years but all the ones I've seen appeared > much too complex for my simple use case. I ended up writing my own > using pyxmlrpc. Unfortunately haven't used it for years and don't > know if I could find it again (was uploaded to pypi at one time). > > Are any of these batch systems simple to install, use, and maintain? > I see batch was included with my f34 system and condor is provided in the updates repo.
I'm confused. Why do you feel the need for an overly-complicated job scheduler, emulating the mainframe scheduling mess? Linux is a unix-like system. Why don't you simply use the already-installed batch command? It can easily handle thousands of simultaneously-scheduled batch jobs without bringing the system to its knees.
Unless your jobs are 100% cpu-bound, scheduling jobs by the number of cpus seems just wrong, leaving a lot of unused cpu cycles on the table. If your jobs are i/o-bound, then disk or network load seems like it should be taken into account, not cpus.
Luckily, the overall system load is always calculated for you with no complicated mechanisms required. The batch command by default will not schedule a job if the load > 1.5 so that you do not impact foreground processes very much. You can also renice your batch job as required to lessen the as-running impact even more.
If what you really need is a CI/CD system, then use the correct tools for the job. Batch is generally not considered to be one of them. Install Jenkins or CircleCI or any of a dozen tools that are built to do this right.
--
John Mellor
My purpose is to queue up a bunch of tasks > #cpus, and have #cpus run at a time in parallel. So if I have 120 jobs to run, and 32 cores, I want to queue them all up and run 32 in parallel at a time. Or, maybe I need to set --ncpus=16 so schedule 16 parallel jobs instead of 32 (my scheduler is very simple and doesn't know about free memory)
On Fri, Feb 4, 2022 at 10:24 AM John Mellor john.mellor@gmail.com wrote:
On 2022-02-04 09:04, Neal Becker wrote:
After this discussion, I needed a simple batch scheduling system. I tried installing and starting condor on F35. Never saw so many selinux problems. Couldn't dnf remove it fast enough.
After a bit of searching, I found the system I wrote 11 years ago. https://pypi.org/project/batch-queue/
I just finished updating for py3 and a few more tweaks.
It's a very simple system that runs on the local host and allows you to submit jobs. It will schedule them up to the #cpus. There are commands to list the queue, kill jobs, suspend and continue them. Does just what I need.
If it helps you too that's be great.
On Wed, Jan 26, 2022 at 2:04 PM Fred Erickson fredferickson@gmail.com wrote:
On Wed, 26 Jan 2022 12:59:23 -0500 Neal Becker ndbecker2@gmail.com wrote:
I've needed this over the years but all the ones I've seen appeared much too complex for my simple use case. I ended up writing my own using pyxmlrpc. Unfortunately haven't used it for years and don't know if I could find it again (was uploaded to pypi at one time).
Are any of these batch systems simple to install, use, and maintain?
I see batch was included with my f34 system and condor is provided in the updates repo.
I'm confused. Why do you feel the need for an overly-complicated job scheduler, emulating the mainframe scheduling mess? Linux is a unix-like system. Why don't you simply use the already-installed batch command? It can easily handle thousands of simultaneously-scheduled batch jobs without bringing the system to its knees.
Unless your jobs are 100% cpu-bound, scheduling jobs by the number of cpus seems just wrong, leaving a lot of unused cpu cycles on the table. If your jobs are i/o-bound, then disk or network load seems like it should be taken into account, not cpus.
Luckily, the overall system load is always calculated for you with no complicated mechanisms required. The batch command by default will not schedule a job if the load > 1.5 so that you do not impact foreground processes very much. You can also renice your batch job as required to lessen the as-running impact even more.
If what you really need is a CI/CD system, then use the correct tools for the job. Batch is generally not considered to be one of them. Install Jenkins or CircleCI or any of a dozen tools that are built to do this right.
--
John Mellor _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Fri, 4 Feb 2022 at 11:56, Neal Becker ndbecker2@gmail.com wrote:
My purpose is to queue up a bunch of tasks > #cpus, and have #cpus run at a time in parallel. So if I have 120 jobs to run, and 32 cores, I want to queue them all up and run 32 in parallel at a time. Or, maybe I need to set --ncpus=16 so schedule 16 parallel jobs instead of 32 (my scheduler is very simple and doesn't know about free memory)
Current GNU parallel has:
--memfree size Minimum memory free when starting another job. The size can be postfixed with K, M, G, T, P, k, m, g, t, or p (see UNIT PREFIX).
If the jobs take up very different amount of RAM, GNU parallel will only start as many as there is memory for. If less than size bytes are free, no more jobs will be started. If less than 50% size bytes are free, the youngest job will be killed, and put back on the queue to be run later.
--retries must be set to determine how many times GNU parallel should retry a given job.
See also: --memsuspend
--memsuspend size Suspend jobs when there is less than 2 * size memory free. The size can be postfixed with K, M, G, T, P, k, m, g, t, or p (see UNIT PREFIX).
If the available memory falls below 2 * size, GNU parallel will suspend some of the running jobs. If the available memory falls below size, only one job will be running.
If a single job takes up at most size RAM, all jobs will complete without running out of memory. If you have swap available, you can usually lower size to around half the size of a single job - with the slight risk of swapping a little.
Jobs will be resumed when more RAM is available - typically when the oldest job completes.
--memsuspend only works on local jobs because there is no obvious way to suspend remote jobs.