[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
7.1 Dropping Slow or Dead Slaves 7.2 Strategies for Greater Concurrency 7.3 Improving Performance 7.4 Long Jobs and Courtesy to Others
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When TOP-C recognizes a dead slave the master terminates communication with that slave, and resubmits the task of that slave to a different slave. (Currently, as of TOP-C 2.5.0, if a slave dies near the end of a computation and after all tasks have been generated, TOP-C may fail to recognize that slave.)
It is sometimes unclear whether a slave process is dead or slower than others. Even a slow slave process may hurt overall performance by causing delays for other processes. TOP-C internally declares a slave process to be "slow" if there are N slaves, and if 3*N other tasks return after the given slave task is "due". If a slow slave has not returned by slave-timeout seconds (see 5.2 Command Line Options in TOP-C Applications), then the slave is considered dead. The master process sends no further tasks to that slave, and sends a replicate of the original task to a new slave.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
DoTask()
and UpdateSharedData()
, save partial
computations in global private variables. Then, in the event of a
REDO
action, `TOP-C' guarantees to invoke
DoTask()
again on the
original slave process or slave thread. That slave may then use
previously computed partial results in order to shorten the required
computation. Note that pointers on the slave to input and output
buffers from previous UPDATE
actions and from the original task
input will no longer be valid. The slave process must copy
any data it wishes to cache to global variables.
In the case of the shared memory model, those global variables must be
thread-private. (see section 8.4.2 Thread-Private Global Variables)
Note the existence of TOPC_is_REDO()
for testing for a REDO
action.
CheckTaskResult()
, the master may merge two or more task
outputs in an application independent way. This may avoid the
need for a REDO
action, or it may reduce the number of required
UPDATE
actions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If your application runs too slowly due to excessive time for communication, consider running multiple slave processes on a single processor. This allows one process to continue computing while another is communicating or idle waiting for a new task to be generated by the master.
If communication overhead or idle time is still too high, consider if it is possible to increase the granularity of your task. TOP-C can aggregating several consecutive tasks as a single larger task to be performed by a single process. This amortizes the network latency of a single network message over several tasks. For example, you can do combine 5 tasks by invoking `--TOPC-aggregated-tasks=5' on the command line of the application. Alternatively, execute the statement:
TOPC_OPT_aggregated_tasks=5; |
TOPC_master_slave()
.
In this situation, the five task outputs will also be bundled
as a single network message. Currently (TOP-C 2.5.0), this
works only if all tasks return NO_ACTION
. TOP-C will signal
an error if TOPC_OPT_aggregated_tasks
> 1 and any action
other than NO_ACTION
is returned.
Other useful techniques that may improve performance of certain applications are:
LIBMPI
in
`.../top-c/Makefile' by your vendor's `limbpi.a' or
`libmpi.so', and delete or modify the the LIBMPI
target in the
`Makefile'.
Alternatively, see the appendix,
C. Using a Different `MPI' with TOP-C,
for a more general way to use a different MPI dialect.
cc
, is recommended over
gcc
for `SMP', due to specialized vendor-specific
architectural issues. Second, if a thread completes its work before
using its full scheduling quantum, the operating system may yield the
CPU of that thread to another thread -- potentially including a thread
belonging to a different process. There are several ways to defend
against this. One defense is to insure that the time for a single task
is significantly longer than one quantum. Another defense is to ask the
operating system to give you at least as many "run slots" as you have
threads (slaves plus master). Some operating systems use
pthread_setconcurrency()
to allow an application to declare this
information, and `TOP-C' invokes pthread_setconcurrency()
where it is available. However, other operating systems may have
alternative ways of tuning the scheduling of threads, and it is
worthwhile to read the relevant manuals of your operating system.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the distributed memory model, infinite loops and broken socket
connections tend to leave orphaned processes running.
In the `TOP-C' distributed memory model,
a slave times out if a task lasts longer than
a half hour or if the master does not reply
in a half hour.
This is implemented with the UNIX system call, alarm()
.
A half hour (1800 seconds) is the default timeout period.
The command line option --TOPC-slave-timeout=num
allows one to change this default. If num is 0,
then there is no timeout and `TOP-C' makes no calls
to SIGALRM
.
The application writer may also find some of the following UNIX system calls useful for allowing large jobs to coexist with other applications.
setpriority(PRIO_PROCESS,getpid(),prio)
#include <unistd.h>
#include <sys/resource.h>
main()
.
setrlimit(RLIMIT_RSS, &rlp)
#include <sys/resource.h>
struct rlimit rlp;
rlp.rlim_max = rlp.rlim_cur = SIZE;
main()
. (Not all operating systems enforce this request.)
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |