Print

Print


In the second test (see below), we saw a performance gap between PUT and GET. So I put in additonal two GridFTP servers (the 'osgserv' class machine below) and did a few tests again. All other testing environment remained the same.

srmPrepareToGet:  1) 5000 PtG, no pause. all completed. Total 710 seconds (~ 7 PtG/sec). TCP connections < 10 all the time.
                             4) 5000 PtG, no pause. all completed. Total 709 seconds (~ 7 PtG/sec). TCP connections < 10 all the time.
                             CPU usage < 100% on BeStMan-xrootd host.

srmPrepareToPut:  2) 5000 PtP, sleep 5 seconds /100 PtP, all completed.
                             3) 5000 PtP, no pause, all completed. Total 885 seconds (~ 5.6PtP/sec). 
                                TCP connection: < 20 for the first 1000 PtP, < 350 during 1000-3000 PtP, < 20 during 3000-5000 PtP (Are there competing activities?)
                             CPU usage 100 - 200%

Note: memory usage (RES) raised from 150MB to 327MB during 1), then raised to 342MB during 2), and stay at 343MB during 3) and 4).
                             
I _think_ the reason of previous PtP and PtG performance gap was that Xrootd storage imposes a 5 seconds delay when creating new files. So the GridFTP servers may be overrun by the incoming PtP requests, which in turn caused large number of glite-url-copy processes acculumated at the submission host (lxplus202.cern.ch, dual dual-core Intel Xeon 5150 2.66Ghz, 8GB). Eventually, BeStMan-xrootd couldn't communicate with the submission host.

--
Wei Yang  |  [log in to unmask]  |  650-926-3338(O)  

> -----Original Message-----
> From: Yang, Wei 
> Sent: Friday, December 14, 2007 11:39 PM
> To: 'SRM Support at LBNL'
> Cc: Michael Ernst; Hironori Ito; Bruce Mellado; Patrick 
> McGuigan; Gowdy, Stephen; Hanushevsky, Andrew B.
> Subject: Stress test result on BeStMan-xrootd (second attempt)
> 
> Test Environment:
> ================
> BeStMan-xrootd 2.2.0.6b1 and XrootdFS : dual dual-core AMD 
> Opteron 275, 2.2Ghz, 4GB memory. 1Gbit NIC. MAX_JAVA_HEAP=2048MB
> 
> GridFTP: osgserv01,04: dual AMD Opteron 244 1.8Ghz (total two 
> cores), 2GB memory, 1 Gbit NIC.
>               griddev05,6,7: dual Intel Pentium III 866Mhz, 
> 1GB memory, 100Mbit NIC.
>  
>               BeStMan-xrootd load balancing these GridFTP 
> servers using a round robin algorithm. We repeat each 
> 'osgserv' host 3 times in BeStMan-xrootd configuration file, 
> and repeat each 'griddev' host one time. 
> 
> Job submission:  lxplus202.cern.ch. Glite-url-copy was used 
> to simulate FTS. On average, lxplus can run 100 
> glite-url-copy for PtP in 16.4 seconds and 100 PtG in 14.3 seconds.
> 
> Note: Compare to the previous test, we reduced the file size 
> from 1.1MB to 79 bytes. And for the SRM host, we more than 
> doubled the CPU power and quadrupled the memory. This SRM 
> host also has less competing activities.
> 
> Test Tool:
> ========= 
> 
> Run glite-url-copy from lxplus.cern.ch in a script like this
>  
> #/bin/bash
> 
> JOBIDS=/tmp/glite-url-copy-job-ids.yangw
> declare -i a b c n
> n=$3       # add a pause (sleep) every n glite-url-copys
>  
> for i in `seq -w $1 $2`; do
> #    glite-url-copy -1 
> httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \
> #    -I 2.2 file:///tmp/junk \
> #
> srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro
otd/atlas/usr/y/yangw/dir$i/junk
> \
> #    >> $JOBIDS
>  
>      glite-url-copy -1
> httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \
>      -I 2.2 \
>  
> srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro
otd/atlas/usr/y/yangw/dir$i/junk
> \
>      file:///tmp/ajunk.$i \
>      >> $JOBIDS
>  
>      a=`echo $i | sed -r 's/^0+//g'`
>      b=a/n
>      c=a-b*n
>      if [ $c -eq 0 ]; then sleep 5; fi
> done
>  
> Test Monitoring:
> ===============
> 
> On BeStMan-xrootd host, we run 'top' to monitor CPU and 
> memory usage. And we use command 'netstat -t | grep lxplus | 
> grep ESTABLISHED | wc -l' to monitor the number of active TCP 
> connections, which may indicate how much BeStMan-xrootd lags 
> behind the incoming requests.
> 
> Test result of PrepareToPut():
> =============================
> (File will be put to a non-existing directory. GridFTP server 
> creates the file and directory)
>  
> 1) 500 PtP, no pause. All completed
> 2) 1000 PtP, no pause. All completed.
> 3) 5000 PtP, sleep 5 seconds for every 100 PtP, 3300 
> completed. BeStMan-xrootd hung with error:
> 
> - Error processing request: Connection reset
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>         at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSI
> GssInputStream.java:58)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshake
> Token(GSIGssInputStream.java:48)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSo
> cket.java:54)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSock
> et.java:117)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:137)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.
> java:155)
>         at 
> org.globus.ogsa.server.GSIServiceThread.process(GSIServiceThre
> ad.java:99)
>         at 
> org.globus.ogsa.server.ServiceThread.run(ServiceThread.java:185)
> 
> 'netstat...' showed 600-700 TCP connections before hanging.
> 
> 4) 5000 PtP, sleep 5 seconds for every 50 PtP. All completed. 
> 
> Note: Most of the time 'top' showed 100-200% CPU usage and < 
> 332 MB memory (RES) and 2.3GB (VIRT). 
> Note: For 4)  'netstat...' showed < 35 TCP connections right 
> before finishing.
> 
> Test result for PrepareToGet():
> ==============================
> 5) 500 PtG, no pause. All completed
> 6) 1000 PtG, no pause. 980 completed. 20 GridFTP error or timeout
> 7) 5000 PtG, sleep 5 seconds for every 200 PtG, 4947 
> completed. 53 GridFTP error or timeout
> 
> Note: at any given time during 6) and 7), 'netstat...' showed 
> no more than 15 TCP connections. 
> Note: I think GridFTP error or timeout can be ignored because 
> they happened at a stage after the glite-url-copy has 
> finished PtG exchange with BeStMan-xrootd.
> 
> --
> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
>