background image

Administrator's Guide

Release 5.0.5

Published April 2010

Summary of Contents for PARASTATION5 V5

Page 1: ...Administrator s Guide Release 5 0 5 Published April 2010...

Page 2: ...rTec logo and the ParaStation logo are trademarks of ParTec Cluster Competence Center GmbH Linux is a registered trademark of Linus Torvalds All other marks and names mentioned herein may be trademark...

Page 3: ...ys ps4 local 19 5 2 4 p4stat 19 5 3 Controlling process placement 19 5 4 Using the ParaStation5 queuing facility 20 5 5 Exporting environment variables for a task 20 5 6 Using non ParaStation applicat...

Page 4: ...tartup 31 6 8 Problem pssh fails 31 6 9 Problem psid does not startup reports port in use 31 6 10 Problem processes cannot access files on remote nodes 32 I Reference Pages 33 parastation conf 35 psia...

Page 5: ...re software project The communication platform used then was Myrinet a Gigabit interconnect developed by Myricom The development of ParaStation2 still took place at the University of Karlsruhe ParaSta...

Page 6: ...part of it s portfolio At the end of 2007 ParaStation5 was released supporting MPI2 and even more interconnects and especially protocols like DAPL ParaStation5 is backward compatible to the previous P...

Page 7: ...addition a couple of libraries providing communication and management functionality must be installed All libraries are provided as static versions which will be linked to the application at compile...

Page 8: ...ork drivers These drivers are based on standard device drivers for the corresponding NICs and especially tuned for best performance within a cluster environment They will also support all standard com...

Page 9: ...rst a so called administration network which is used to handle all the administrative tasks that have to be dealt with within a cluster Besides commonly used services like sharing of NFS partitions or...

Page 10: ...hin the 2 4 and 2 6 kernel streams Using InfiniBand and Myrinet requires additional modules and may restrict the supported kernels 3 2 Directory structure The default location to install ParaStation5...

Page 11: ...re system packages supplying MPIch for GNU Intel Portland Group and Pathscale compilers are available A documentation package is also obtainable The full names of the RPM files follow a simple structu...

Page 12: ...are built on While compiling the package support for Infiniband will be included if one of the following files where found File Version usr mellanox include vapi evapi h Mellanox usr include infiniban...

Page 13: ...rease performance and to minimize latency it s highly recommended Using the provided drivers does not influence other network communication While installing the ParaStation management RPM the file etc...

Page 14: ...le which are built using different compilers like the PGI or Intel compilers on the Intel IA32 platform the Intel compiler on the IA64 platform and the PGI Intel and Pathscale compiler on X86_64 platf...

Page 15: ...sting These steps will be discussed in Chapter 4 Configuration 3 7 Uninstalling ParaStation5 After stoping the ParaStation daemons the corresponding packets can be removed using etc init d parastation...

Page 16: ...12 ParaStation5 Administrator s Guide...

Page 17: ...id 8 Most of these parameters are set to their default value within lines marked as comments Only those that have to be modified in order to adapt ParaStation to the local environment are enabled Addi...

Page 18: ...tarter and accounter may be ignored for now For a detailed description of these parameters refer to the parastation conf 5 manual page Usually the nodes will be enlisted ordered by increasing ParaStat...

Page 19: ...o reload the new version of the network drivers it is necessary to reboot the system 4 3 Testing the installation After installing and configuring ParaStation on each node of the cluster the ParaStati...

Page 20: ...ging up all nodes the communication can be tested using opt parastation bin test_nodes np nodes where nodes has to be replaced by the actual number of nodes within the cluster After a while a result l...

Page 21: ...sfer data across various networks like Infiniband or 10G Ethernet using a vendor provided libdapl QsNet The libpscom supports the QsNetII transport layer Using the libpscom4elan plug in it may transfe...

Page 22: ...nnections polling returns the current value for the polling flag 0 never poll 1 poll if otherwise idle number of runable processes number of CPUs 2 always poll Writing this value will immediately chan...

Page 23: ...idx 0 refs 10 Socket 2 Addr 70 6f 72 74 31 port144 last_idx 0 refs 10 opt parastation bin p4stat n net_idx SSeqNo SWindow RSeqNo RWindow lusridx lnetidx rnetidx snq rnq refs 84 30107 30467 30109 30468...

Page 24: ...scribed procedure will be circumvented and the processes will be run on the user defined nodes For a detailed discussion of placing processes within ParaStation5 please refer to process placement 7 ps...

Page 25: ...n 1 To run an administrative task use pssh or mpiexec A n 1 For more details on how to start up serial and parallel jobs refer to mpiexec 8 pssh 8 and the ParaStation5 User s Guide 5 7 ParaStation5 TC...

Page 26: ...on lib64 libpscomopenib so This variable is automatically exported to all processes started by ParaStation Refer to Section 5 1 ParaStation5 pscom communication library for a full list of available li...

Page 27: ...in parallel The output of the individual commands is presented in a sophisticated manner showing common parts and differences psh may also be used to copy files to all nodes of the cluster in parallel...

Page 28: ...ript tok2env bin bash tmp IFS IFS export AFS_TOKEN GetToken uuencode dev stdout IFS tmp Script env2tok bin bash IFS echo AFS_TOKEN uudecode SetToken exec 5 15 Integrating external queuing systems Para...

Page 29: ...ion with PBS PRO 5 15 4 Integration with LSF Similar to Section 5 15 1 Integration with PBS PRO ParaStation will also recognize the variable LSB_HOSTS provided by LSF This variable holds a list of nod...

Page 30: ...conf or etc sysconfig networks routes depending on the type of Linux distribution in use 5 17 Copying files in parallel To copy large files to many or all nodes in a cluster at once pscp is very hand...

Page 31: ...on NUMA based systems This will give hints to the memory management subsystem of the operating system to select nearest memory if available Memory binding may be enabled or disabled globally or on a...

Page 32: ...psidstarter to reflect the newly assigned port numbers In addition the ParaStation daemon psid 8 uses the UDP port 886 for RDP connections To change this port use the RDPPort directive within parasta...

Page 33: ...s to be ok up to now check for recent entries within the log file var log messages Be aware the log facility can be modified using the LogDestination within the config file parastation conf Look for l...

Page 34: ...odes Verify that the program is executable on all nodes 6 4 Problem bad performance Verify that the proper interconnect and or transport is used check for environment variables controlling transport s...

Page 35: ...tmp username is accessible on each node or change your current directory to a globally accessible directory 6 8 Problem pssh fails Problem users other than root cannot run commands on remote nodes usi...

Page 36: ...esses cannot access files on remote nodes Problem processes created by ParaStation on remote nodes are not able to access files if this files have enabled access only for a supplementary group the cur...

Page 37: ...tor s Guide 33 Reference Pages This appendix lists all reference pages related to ParaStation5 administration tasks For reference pages describing user related commands and information refer to the Pa...

Page 38: ...34 ParaStation5 Administrator s Guide...

Page 39: ...iguration file template parastation conf tmpl contained in the distributed ParaStation system The template file can be found in opt parastation config Parameters The different parameters are discussed...

Page 40: ...communication hardware This is mainly used in order to generate the lines shown be the status counter directive of the ParaStation administration tool psiadmin 1 headerscript Define a script called in...

Page 41: ...ecognized gm Use communication over GM Myrinet The script ps_gm will load the Myrinet gm driver PS_IPENABLED If set to 1 the IP device myri0 is enabled after loading elan Use communication over QsNet...

Page 42: ...he default value of HWType is none starter true yes 1 false no 0 If the argument is one of yes true or 1 all nodes declared within a Node statement will allow to start parallel tasks unless otherwise...

Page 43: ...as the stand alone commands to set the corresponding default value E g the line Node node17 16 HWType ethernet p4sock starter yes runJobs no will define the node node17 to have the ParaStation ID 16 F...

Page 44: ...logs a huge amount of message in the logging destination which is usually the syslog 3 This parameter can be set during runtime via the set psiddebug directive within the ParaStation administration a...

Page 45: ...number the string infinity or the string unlimited In the two latter cases the data size is set to RLIM_INFINITY DataSize size Set the maximum data size to size kilobytes size is an integer number the...

Page 46: ...p to 4 0 6 will be enabled Keep in mind that this behavior might collide with the freeOnSuspend feature If the argument is one of no false or 0 ParaStation will disable compatibility mode UseMCast tru...

Page 47: ...hose CPU slots and physical CPUs and cores is made using a mapping list See CPUmap below The pinProcs parameter can be set during runtime via the set pinprocs directive within the ParaStation administ...

Page 48: ...teers the actual load introduced by RDP Within the daemon there is a lower limit for all timeout timers of 100 msec Thus the minimal value here is 100 too deadLimit number Dead limit of the RDP status...

Page 49: ...e 45 ACK is sent piggyback within the next regular packet to this node or as soon as a retransmission occurred If set to 1 each RDP packet received is acknowledged by an explicit ACK Errors No known e...

Page 50: ...46 ParaStation5 Administrator s Guide...

Page 51: ...down single nodes or the whole system requires root privilege Options c command command Execute the single directive command and exit d Do not automatically start up the local psid 8 e echo Echo each...

Page 52: ...Comments begin with the character and continue to end of the line Comments and blank lines are ignored by psiadmin Upon startup psiadmin tries to find the file psiadminrc first in the current directo...

Page 53: ...ware load mcast memory node proc cnt count rdp summary max max up version nodes list jobs state running state pending state suspended slots tid Report various states of the selected node s or job s De...

Page 54: ...follows The total number of processes contains all processes managed by the ParaStation system including Logger Forwarder and psiadmin 1 processes Furthermore of course the actual working processes s...

Page 55: ...e root processes of parallel tasks which converted to a ParaStation Logger process are tagged with L after the user ID System processes which are not counted are marked as Accounting processes are ind...

Page 56: ...p bindmem adminuser admingroup rl_addressspace rl_core rl_cpu rl_data rl_fsize rl_locks rl_memlock rl_msgqueue rl_nofile rl_nproc rl_rss rl_sigpending rl_stack supplementaryGroups statusBroadcasts rdp...

Page 57: ...e the job continues to run this is the behavior as long as the flag has the value 0 Since the master node does all the resource management within the cluster only the value on this node actually steer...

Page 58: ...unaccounted tasks rl_addressspace nodes Show RLIMIT_AS on this node rl_core nodes Show RLIMIT_CORE on this node rl_cpu nodes Show RLIMIT_CPU on this node rl_data nodes Show RLIMIT_DATA on this node rl...

Page 59: ...in the RDP facility in milli seconds See also parastation conf 5 rdpResendTimeout nodes Show the resend timeout within the RDP facility in milli seconds See also parastation conf 5 rdpMaxACKPend nodes...

Page 60: ...node In principle nodes might contain an unlimited number of ranges If nodes value is all all nodes of the ParaStation cluster are selected If nodes is empty the node range preselected via the range...

Page 61: ...user name or to any user If name is preceeded by a or this user is added to or removed from the list of users respectively group name any nodes Grant exclusive access on the selected node s to the spe...

Page 62: ...10000 PSID_LOG_COMM General daemon communication 0x0020000 PSID_LOG_OPTION Option handling 0x0040000 PSID_LOG_INFO Handling of info request messages 0x0080000 PSID_LOG_PART Partition creation and mana...

Page 63: ...2 MCAST_LOG_INTR Interrupted syscalls 0x0004 MCAST_LOG_CONN T_CLOSE and new pings 0x0008 MCAST_LOG_5MIS Every 5th missing ping 0x0010 MCAST_LOG_MSNG Every missing ping 0x0020 MCAST_LOG_MSNG Every rece...

Page 64: ...s only comes into play if the user does not define a sorting strategy explicitely via PSI_NODES_SORT Be aware of the fact that using a batch system like PBS or LSF will set the strategy explicitely na...

Page 65: ...ee also parastation conf 5 rdpTimeout ms nodes Set the RDP timeout in ms for all selected nodes See also parastation conf 5 deadLimit num nodes Set the dead limit of the RDP status module After this n...

Page 66: ...untime Files Upon startup psiadmin tries to find psiadminrc in the current directory or in the user s home directory The first file found is parsed and the directives within are executed Afterwards ps...

Page 67: ...must always run with root privileges Before a process can communicate with the ParaStation system it has to register with the daemon Access may be granted or denied The daemon can deny the access due...

Page 68: ...ebug command of psiadmin 1 Be aware of the fact that high values of level lead to excessively much debugging output spoiling the syslog 3 or the logfile f configfile file Choose file to be the ParaSta...

Page 69: ...filename Description test_config reads and analyses the ParaStation4 configuration file Any errors or anomalies are reported By default the configuration file etc parastation conf will be used Options...

Page 70: ...66 ParaStation5 Administrator s Guide...

Page 71: ...y node has received data from any node i e an all to all communication was executed a success message is printed and test_nodes exits Otherwise after a certain timeout a message concerning the current...

Page 72: ...68 ParaStation5 Administrator s Guide...

Page 73: ...a cluster Synopsis test_pse np num Description This command spawns num processes within the cluster It s intended to test the process spawning capabilities of ParaStation It does not test any communi...

Page 74: ...70 ParaStation5 Administrator s Guide...

Page 75: ...Display information for sockets and network connections using the ParaStation4 protocol p4sock Options s sock Display information about open p4sock sockets n net Display information of network connect...

Page 76: ...72 ParaStation5 Administrator s Guide...

Page 77: ...ib64 must be pre loaded by both processes using export LD_PRELOAD opt parastation lib64 libp4tcp so For parallel and serial tasks launched by ParaStation this environment variable is exported to all p...

Page 78: ...74 ParaStation5 Administrator s Guide...

Page 79: ...d debug flag Print debug information Pattern can be a combination of the following bits Pattern Description 0x010 More warning messages 0x020 Show process information start exit 0x040 Show received m...

Page 80: ...e Define that a core file should be written in case of a catastrophy By default the core file will be written to tmp coredir dir Defines where to save core files v version Output version information a...

Page 81: ...ng output h human Print times and timestamps in more human readable form nh noheader Suppress headers st stotopt optstring Defines columns displayed within the user list group list and the total summa...

Page 82: ...e job list is sorted by Valid entries are user group jobid jobname start end walltime qtime mem vmem cputime queue procs and exit usort criteria Selects the criteria where the user list is sorted by V...

Page 83: ...roup group list or as a total summary of all jobs Multiple lists can be selected by default all information is shown Lists may be sorted by columns and may be filtered to only show information about a...

Page 84: ...trator s Guide These column names may also be used for sorting lists where applicable Files var account var account gz var account bz2 Accounting files one per day HOME psaccviewrc Initialization file...

Page 85: ...pinning bars Instead a detailed message about each received multicast ping is displayed m mcast MCAST Listen to multicast group MCAST Set this to the value of MCastGroup in the ParaStation configurati...

Page 86: ...82 ParaStation5 Administrator s Guide...

Page 87: ...m 5 0 0 0 i586 rpm rpm U pscom modules 5 0 0 0 i586 rpm rpmbuild rebuild psmpi2 5 0 0 1 src rpm rpm U psmpi2 5 0 0 1 i586 rpm The psmgmt package must be installed before the pscom package may be built...

Page 88: ...ation bin psiadmin psiadmin add Alternatively you can start psiadmin 1 with the s option To install the ParaStation daemon as a system service started up at boot time use chk_config a etc init d paras...

Page 89: ...ever the opportunity to use the software according to this license one time for a limited period of three 3 months It is acknowledged that ParTec has invested an massive amount of labour and financial...

Page 90: ...ality obligation which complies with this agreement 3 Furthermore Licensee promises not to publish the Software as object code or as source code nor the corresponding comments either totally or in par...

Page 91: ...indirect or subsequent damages due to errors of the licensed Software 2 ParTec is not aware of any rights of third parties which would oppose University Use or Commercial Use ParTec is not liable howe...

Page 92: ...nternational Sale of Goods CISG and International Private Law Attachment I Declaration of Origin Material covered by this certificate version release etc ______________________________________________...

Page 93: ...features like process pinning should be used adjust the existing configuration file Look for pinProcs CPUmap bindMem supplGrps and RLimit Core entries in the new template file parastation conf tmpl c...

Page 94: ...raStation4 can be run using the new mpiexec command In this case the option b or bnr is required The environment variable PSP_P4SOCK was renamed to PSP_P4S but still recognized Within this version of...

Page 95: ...MPI This task will not be accounted within the ParaStation process management ie it will not allocate a dedicated CPU Thus administration tasks may be startet in addition to parallel tasks See also S...

Page 96: ...different memory addresses may vary Parallel Task A bunch of processes distributed within the cluster forming an instance of a parallel application E g a MPI program running on several nodes of a clus...

Page 97: ...e compute nodes within the cluster This process does not communicate with other processes using MPI ParaStation knows about this process and where it is started from A serial task may use multiple thr...

Page 98: ...94 ParaStation5 Administrator s Guide...

Reviews: