|
|
cb9bdf |
This README describes how to get the most basic working
|
|
|
cb9bdf |
torque service on a single host.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
To setup a basic single-node localhost-only batch system, install the
|
|
|
cb9bdf |
torque-server, torque-mom, and torque-scheduler packages, and do something like
|
|
|
cb9bdf |
this:
|
|
|
cb9bdf |
|
|
|
cb9bdf |
0) If torque is built with munge support then this
|
|
|
cb9bdf |
must be enabled first on all nodes. The munge
|
|
|
cb9bdf |
package should allready be installed.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
Create a munge key with
|
|
|
cb9bdf |
|
|
|
cb9bdf |
/usr/sbin/create-munge-key
|
|
|
cb9bdf |
|
|
|
cb9bdf |
Copy resulting key /etc/munge/munge.key to
|
|
|
cb9bdf |
all torque nodes in your cluster including
|
|
|
cb9bdf |
pbs_server, pbs_mom and client (qstat,qsub) nodes.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
1) Get your full hostname with
|
|
|
cb9bdf |
|
|
|
cb9bdf |
# /bin/hostname --long
|
|
|
cb9bdf |
|
|
|
cb9bdf |
e.g myhost.example.org
|
|
|
cb9bdf |
|
|
|
cb9bdf |
2) Edit /etc/torque/server_name
|
|
|
cb9bdf |
to contain the single line
|
|
|
cb9bdf |
|
|
|
cb9bdf |
myhost.example.org
|
|
|
cb9bdf |
|
|
|
cb9bdf |
3) Edit /etc/torque/mom/config
|
|
|
cb9bdf |
to contain the single line
|
|
|
cb9bdf |
|
|
|
cb9bdf |
$pbsserver myhost.example.org
|
|
|
cb9bdf |
|
|
|
cb9bdf |
4) Create a torque serverdb file.
|
|
|
cb9bdf |
# /usr/sbin/pbs_server -D -t create
|
|
|
cb9bdf |
|
|
|
cb9bdf |
Warning this will remove any existing serverdb
|
|
|
cb9bdf |
file located at /var/lib/torque/server_priv/serverdb
|
|
|
cb9bdf |
|
|
|
cb9bdf |
You will have to Ctrl^C the pbs_server command, it will
|
|
|
cb9bdf |
only take a moment to create this file.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
5) Start the pbs_server and configure it.
|
|
|
cb9bdf |
service pbs_server start
|
|
|
cb9bdf |
# qmgr -c "s s scheduling=true"
|
|
|
cb9bdf |
# qmgr -c "c q batch queue_type=execution"
|
|
|
cb9bdf |
# qmgr -c "s q batch started=true"
|
|
|
cb9bdf |
# qmgr -c "s q batch enabled=true"
|
|
|
cb9bdf |
# qmgr -c "s q batch resources_default.nodes=1"
|
|
|
cb9bdf |
# qmgr -c "s q batch resources_default.walltime=3600"
|
|
|
cb9bdf |
# qmgr -c "s s default_queue=batch"
|
|
|
cb9bdf |
|
|
|
cb9bdf |
6) Add one batch worker to your pbs_server.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
# qmgr -c "c n myhost.example.org"
|
|
|
cb9bdf |
|
|
|
cb9bdf |
7) Start the pbs_mom and pbs_sched deamons.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
# service pbs_mom start
|
|
|
cb9bdf |
# service pbs_sched start
|
|
|
cb9bdf |
|
|
|
cb9bdf |
8) Use chkconfig to start the services at boot time.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
# /sbin/chkconfig pbs_mom on
|
|
|
cb9bdf |
# /sbin/chkconfig pbs_server on
|
|
|
cb9bdf |
# /sbin/chkconfig pbs_sched on
|
|
|
cb9bdf |
# /sbin/chkconfig munge on
|
|
|
cb9bdf |
|
|
|
cb9bdf |
9) Submit a test job.
|
|
|
cb9bdf |
As a user not as root run the following
|
|
|
cb9bdf |
|
|
|
cb9bdf |
$ qsub <
|
|
|
cb9bdf |
hostname
|
|
|
cb9bdf |
echo "Hi I am a batch job running in torque"
|
|
|
cb9bdf |
EOF
|
|
|
cb9bdf |
|
|
|
cb9bdf |
10 ) Monitor the state of that job with qstat.
|
|
|
cb9bdf |
|
|
|
cb9bdf |
In case of problems first of all look in /var/log/torque
|
|
|
cb9bdf |
|
|
|
cb9bdf |
|
|
|
cb9bdf |
|