In the second test (see below), we saw a performance gap between PUT and GET. So I put in additonal two GridFTP servers (the 'osgserv' class machine below) and did a few tests again. All other testing environment remained the same. srmPrepareToGet: 1) 5000 PtG, no pause. all completed. Total 710 seconds (~ 7 PtG/sec). TCP connections < 10 all the time. 4) 5000 PtG, no pause. all completed. Total 709 seconds (~ 7 PtG/sec). TCP connections < 10 all the time. CPU usage < 100% on BeStMan-xrootd host. srmPrepareToPut: 2) 5000 PtP, sleep 5 seconds /100 PtP, all completed. 3) 5000 PtP, no pause, all completed. Total 885 seconds (~ 5.6PtP/sec). TCP connection: < 20 for the first 1000 PtP, < 350 during 1000-3000 PtP, < 20 during 3000-5000 PtP (Are there competing activities?) CPU usage 100 - 200% Note: memory usage (RES) raised from 150MB to 327MB during 1), then raised to 342MB during 2), and stay at 343MB during 3) and 4). I _think_ the reason of previous PtP and PtG performance gap was that Xrootd storage imposes a 5 seconds delay when creating new files. So the GridFTP servers may be overrun by the incoming PtP requests, which in turn caused large number of glite-url-copy processes acculumated at the submission host (lxplus202.cern.ch, dual dual-core Intel Xeon 5150 2.66Ghz, 8GB). Eventually, BeStMan-xrootd couldn't communicate with the submission host. -- Wei Yang | [log in to unmask] | 650-926-3338(O) > -----Original Message----- > From: Yang, Wei > Sent: Friday, December 14, 2007 11:39 PM > To: 'SRM Support at LBNL' > Cc: Michael Ernst; Hironori Ito; Bruce Mellado; Patrick > McGuigan; Gowdy, Stephen; Hanushevsky, Andrew B. > Subject: Stress test result on BeStMan-xrootd (second attempt) > > Test Environment: > ================ > BeStMan-xrootd 2.2.0.6b1 and XrootdFS : dual dual-core AMD > Opteron 275, 2.2Ghz, 4GB memory. 1Gbit NIC. MAX_JAVA_HEAP=2048MB > > GridFTP: osgserv01,04: dual AMD Opteron 244 1.8Ghz (total two > cores), 2GB memory, 1 Gbit NIC. > griddev05,6,7: dual Intel Pentium III 866Mhz, > 1GB memory, 100Mbit NIC. > > BeStMan-xrootd load balancing these GridFTP > servers using a round robin algorithm. We repeat each > 'osgserv' host 3 times in BeStMan-xrootd configuration file, > and repeat each 'griddev' host one time. > > Job submission: lxplus202.cern.ch. Glite-url-copy was used > to simulate FTS. On average, lxplus can run 100 > glite-url-copy for PtP in 16.4 seconds and 100 PtG in 14.3 seconds. > > Note: Compare to the previous test, we reduced the file size > from 1.1MB to 79 bytes. And for the SRM host, we more than > doubled the CPU power and quadrupled the memory. This SRM > host also has less competing activities. > > Test Tool: > ========= > > Run glite-url-copy from lxplus.cern.ch in a script like this > > #/bin/bash > > JOBIDS=/tmp/glite-url-copy-job-ids.yangw > declare -i a b c n > n=$3 # add a pause (sleep) every n glite-url-copys > > for i in `seq -w $1 $2`; do > # glite-url-copy -1 > httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \ > # -I 2.2 file:///tmp/junk \ > # > srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro otd/atlas/usr/y/yangw/dir$i/junk > \ > # >> $JOBIDS > > glite-url-copy -1 > httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \ > -I 2.2 \ > > srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro otd/atlas/usr/y/yangw/dir$i/junk > \ > file:///tmp/ajunk.$i \ > >> $JOBIDS > > a=`echo $i | sed -r 's/^0+//g'` > b=a/n > c=a-b*n > if [ $c -eq 0 ]; then sleep 5; fi > done > > Test Monitoring: > =============== > > On BeStMan-xrootd host, we run 'top' to monitor CPU and > memory usage. And we use command 'netstat -t | grep lxplus | > grep ESTABLISHED | wc -l' to monitor the number of active TCP > connections, which may indicate how much BeStMan-xrootd lags > behind the incoming requests. > > Test result of PrepareToPut(): > ============================= > (File will be put to a non-existing directory. GridFTP server > creates the file and directory) > > 1) 500 PtP, no pause. All completed > 2) 1000 PtP, no pause. All completed. > 3) 5000 PtP, sleep 5 seconds for every 100 PtP, 3300 > completed. BeStMan-xrootd hung with error: > > - Error processing request: Connection reset > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:168) > at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31) > at > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSI > GssInputStream.java:58) > at > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshake > Token(GSIGssInputStream.java:48) > at > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSo > cket.java:54) > at > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSock > et.java:117) > at > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:137) > at > org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket. > java:155) > at > org.globus.ogsa.server.GSIServiceThread.process(GSIServiceThre > ad.java:99) > at > org.globus.ogsa.server.ServiceThread.run(ServiceThread.java:185) > > 'netstat...' showed 600-700 TCP connections before hanging. > > 4) 5000 PtP, sleep 5 seconds for every 50 PtP. All completed. > > Note: Most of the time 'top' showed 100-200% CPU usage and < > 332 MB memory (RES) and 2.3GB (VIRT). > Note: For 4) 'netstat...' showed < 35 TCP connections right > before finishing. > > Test result for PrepareToGet(): > ============================== > 5) 500 PtG, no pause. All completed > 6) 1000 PtG, no pause. 980 completed. 20 GridFTP error or timeout > 7) 5000 PtG, sleep 5 seconds for every 200 PtG, 4947 > completed. 53 GridFTP error or timeout > > Note: at any given time during 6) and 7), 'netstat...' showed > no more than 15 TCP connections. > Note: I think GridFTP error or timeout can be ignored because > they happened at a stage after the glite-url-copy has > finished PtG exchange with BeStMan-xrootd. > > -- > Wei Yang | [log in to unmask] | 650-926-3338(O) >