Showing posts with label MOAB. Show all posts
Showing posts with label MOAB. Show all posts

pbs node manipulation Revision

It has been a long time since I used torque. I though I pen down some common commands I will used

  • pbsnodes -l (List node names and their state. If no state is specified, only nodes in the DOWN, OFFLINE, or UNKNOWN states are listed. Specifying a state string acts as an output filter. Valid state strings are "active", "all", "busy", "down", "free", "offline", "unknown", and "up". )
  • pbsnodes -o (Add the OFFLINE state. This is different from being marked DOWN. OFFLINE prevents new jobs from running on the specified nodes. This gives the administrator a tool to hold a node out of service without changing anything else. The OFFLINE state will never be set or cleared automatically by pbs_server; it is purely for the manager or operator.)
  • pbsnodes -c (Clear OFFLINE from listed nodes)
  • pbsnodes -a (All attributes of a node or all nodes are listed. This is the default if no flag is given)

Learning MOAB - Using showq

showq displays information about queued jobs including active, eligible, blocked, and/or recently completed jobs. Do note that the -c, -i, and -r flags can be used  Administrator only

Note 1: Display blocked jobs only
> showq -b

Note 2: Display details about recently jobs (by MOAB Admin)
# showq -c

Note 3: Display local and full resource manager job ids. If specified with the -i option, will display job reservation time. (If using MOAB
# show -vi

Note 4: Display jobs associated with specified constraint. Valid constraint include user, group, account, class and qos
> showq -w user=john

Note 5: Display extended details about active (running jobs)
# show -r
------------------------------------------------------
S - Job State. Either "R" for Running or "S" for Starting.
PAR - Partition in which job is running.
EFFIC - CPU efficiency of job.
XFACTOR - Current expansion factor of job, where XFactor = (QueueTime + WallClockLimit) / WallClockLimit
Q - Quality Of Service specified for job.
USERNAME - User owning job.
GROUP - Primary group of job owner.
MHOST - Master Host running primary task of job.
PROC  - Number of processors being used by the job.
--------------------------------------------------------

Learning MOAB. What is workload?

When you are doing High Performance Computing or Cloud, the concept of workload is very important. Basically there are 4 kinds of workloads (Taken and summarise from MOAB Admin Guide 5.2)

  1. Batch Workload - With a Batch Job, the job is submitted to a job queue, and is run somewhere on the cluster as resources becomes available
  2. Interactive Workload - Requestors are interested in immediate response and are generally waiting for interactive request to be executed before going on to other activities. To manage interactive jobs, the focus is usually on setting aside resources to guarantee immediate execution or at least a minimal wait time for interactive jobs
  3. Calendar Workload - Calendar workload must be executed at a particular time and possibly in a regular periodic manner. There must be guarantee of resource availability at the needed time to allow calendar jobs to run as required.
  4.  Service Workload - Service workload processes externally generated transaction requests while a scheduler or resource allocation mechanism provides the distributed service with needed resources to meet target backlog or response targets to the service. Examples are web farms etc. 

MOAB Adaptive Datacenter and Cloud Solution

If you are considering MOAB solution for Cloud Computing, here 4 sets of online video which might help in your decision

  1. Adaptive DataCenter
  2. Key Benefits
  3. Internal Hosting
  4. External Hosting

Happy viewing.....

MOAB and Cloud Computing

For those who are keen on Cloud Computing and how MOAB can make the HPC Cloud-like. Go to Chris's Blog on MOAB. Excellent Resource

Archive for the ‘Moab’ Category

MOAB and storing of statistics with ODBC

This documentation ias a useful "How-To" to show how to set up and configure Moab to connect to a MySQL database using the MySQL ODBC driver.

For more information, do look at MOAB Workload Manager, 22.2 ODBC

Jobs submitted placed on the queue and not able to run on HPC

Sometimes, when out users submit job via MOAB using Torque as a resource manager, their jobs is just not able to run even though there are resources to accomdoate the run. Of course there are many ways to troubleshoot. Here is how I troubleshoot 1 of them.

Step 1: Use the command checkjob on the process ID of the job that is stuck.
# checkjob 100001
--------------------------------------
Partition List:      xxxxxxxx
Flags:                 RESTARTABLE
Attr:                   checkpoint
StartPriority:       94
rejected for CPU              - (null)
rejected for State              - (null)
NOTE: job req cannot run in partition xxxxx (available procs do not meet requirements : 0 of 32 procs found)
---------------------------------------
Here's the key hint under NOTE. There is no node suitable for this user. But why? Launch another command


# qstat -f 100001
----------------------------------
euser = xxxxx
egroup = xxxxx
queue_rank = xxxxx
queue_type = xxxxx
etime = xxxxxxxx
submit_args = -1 nodes2:ppn=16 ./run.sh
-----------------------------------
Note the key issue is that users submitted a job requesting 2 nodes with 16 cores each. This does not exist in our cluster configuration. Hence it is not able to run....

Using MOAB and Torque Commands together to analyse the problem is useful indeed.