LISTSERV mailing list manager LISTSERV 16.5

Help for ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L Archives

ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L  December 2007

ATLAS-SCCS-PLANNING-L December 2007

Subject:

Stress test result on BeStMan-xrootd (third attempt)

From:

"Yang, Wei" <[log in to unmask]>

Date:

16 Dec 2007 00:10:05 -0800Sun, 16 Dec 2007 00:10:05 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (162 lines)

In the second test (see below), we saw a performance gap between PUT and GET. So I put in additonal two GridFTP servers (the 'osgserv' class machine below) and did a few tests again. All other testing environment remained the same.

srmPrepareToGet:  1) 5000 PtG, no pause. all completed. Total 710 seconds (~ 7 PtG/sec). TCP connections < 10 all the time.
                             4) 5000 PtG, no pause. all completed. Total 709 seconds (~ 7 PtG/sec). TCP connections < 10 all the time.
                             CPU usage < 100% on BeStMan-xrootd host.

srmPrepareToPut:  2) 5000 PtP, sleep 5 seconds /100 PtP, all completed.
                             3) 5000 PtP, no pause, all completed. Total 885 seconds (~ 5.6PtP/sec). 
                                TCP connection: < 20 for the first 1000 PtP, < 350 during 1000-3000 PtP, < 20 during 3000-5000 PtP (Are there competing activities?)
                             CPU usage 100 - 200%

Note: memory usage (RES) raised from 150MB to 327MB during 1), then raised to 342MB during 2), and stay at 343MB during 3) and 4).
                             
I _think_ the reason of previous PtP and PtG performance gap was that Xrootd storage imposes a 5 seconds delay when creating new files. So the GridFTP servers may be overrun by the incoming PtP requests, which in turn caused large number of glite-url-copy processes acculumated at the submission host (lxplus202.cern.ch, dual dual-core Intel Xeon 5150 2.66Ghz, 8GB). Eventually, BeStMan-xrootd couldn't communicate with the submission host.

--
Wei Yang  |  [log in to unmask]  |  650-926-3338(O)  

> -----Original Message-----
> From: Yang, Wei 
> Sent: Friday, December 14, 2007 11:39 PM
> To: 'SRM Support at LBNL'
> Cc: Michael Ernst; Hironori Ito; Bruce Mellado; Patrick 
> McGuigan; Gowdy, Stephen; Hanushevsky, Andrew B.
> Subject: Stress test result on BeStMan-xrootd (second attempt)
> 
> Test Environment:
> ================
> BeStMan-xrootd 2.2.0.6b1 and XrootdFS : dual dual-core AMD 
> Opteron 275, 2.2Ghz, 4GB memory. 1Gbit NIC. MAX_JAVA_HEAP=2048MB
> 
> GridFTP: osgserv01,04: dual AMD Opteron 244 1.8Ghz (total two 
> cores), 2GB memory, 1 Gbit NIC.
>               griddev05,6,7: dual Intel Pentium III 866Mhz, 
> 1GB memory, 100Mbit NIC.
>  
>               BeStMan-xrootd load balancing these GridFTP 
> servers using a round robin algorithm. We repeat each 
> 'osgserv' host 3 times in BeStMan-xrootd configuration file, 
> and repeat each 'griddev' host one time. 
> 
> Job submission:  lxplus202.cern.ch. Glite-url-copy was used 
> to simulate FTS. On average, lxplus can run 100 
> glite-url-copy for PtP in 16.4 seconds and 100 PtG in 14.3 seconds.
> 
> Note: Compare to the previous test, we reduced the file size 
> from 1.1MB to 79 bytes. And for the SRM host, we more than 
> doubled the CPU power and quadrupled the memory. This SRM 
> host also has less competing activities.
> 
> Test Tool:
> ========= 
> 
> Run glite-url-copy from lxplus.cern.ch in a script like this
>  
> #/bin/bash
> 
> JOBIDS=/tmp/glite-url-copy-job-ids.yangw
> declare -i a b c n
> n=$3       # add a pause (sleep) every n glite-url-copys
>  
> for i in `seq -w $1 $2`; do
> #    glite-url-copy -1 
> httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \
> #    -I 2.2 file:///tmp/junk \
> #
> srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro
otd/atlas/usr/y/yangw/dir$i/junk
> \
> #    >> $JOBIDS
>  
>      glite-url-copy -1
> httpg://osgserv01.slac.stanford.edu:8443/srm/v2/server \
>      -I 2.2 \
>  
> srm://osgserv01.slac.stanford.edu:8443/srm/v2/server\?SFN=/xro
otd/atlas/usr/y/yangw/dir$i/junk
> \
>      file:///tmp/ajunk.$i \
>      >> $JOBIDS
>  
>      a=`echo $i | sed -r 's/^0+//g'`
>      b=a/n
>      c=a-b*n
>      if [ $c -eq 0 ]; then sleep 5; fi
> done
>  
> Test Monitoring:
> ===============
> 
> On BeStMan-xrootd host, we run 'top' to monitor CPU and 
> memory usage. And we use command 'netstat -t | grep lxplus | 
> grep ESTABLISHED | wc -l' to monitor the number of active TCP 
> connections, which may indicate how much BeStMan-xrootd lags 
> behind the incoming requests.
> 
> Test result of PrepareToPut():
> =============================
> (File will be put to a non-existing directory. GridFTP server 
> creates the file and directory)
>  
> 1) 500 PtP, no pause. All completed
> 2) 1000 PtP, no pause. All completed.
> 3) 5000 PtP, sleep 5 seconds for every 100 PtP, 3300 
> completed. BeStMan-xrootd hung with error:
> 
> - Error processing request: Connection reset
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>         at org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:31)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSI
> GssInputStream.java:58)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshake
> Token(GSIGssInputStream.java:48)
>         at 
> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSo
> cket.java:54)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSock
> et.java:117)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:137)
>         at 
> org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.
> java:155)
>         at 
> org.globus.ogsa.server.GSIServiceThread.process(GSIServiceThre
> ad.java:99)
>         at 
> org.globus.ogsa.server.ServiceThread.run(ServiceThread.java:185)
> 
> 'netstat...' showed 600-700 TCP connections before hanging.
> 
> 4) 5000 PtP, sleep 5 seconds for every 50 PtP. All completed. 
> 
> Note: Most of the time 'top' showed 100-200% CPU usage and < 
> 332 MB memory (RES) and 2.3GB (VIRT). 
> Note: For 4)  'netstat...' showed < 35 TCP connections right 
> before finishing.
> 
> Test result for PrepareToGet():
> ==============================
> 5) 500 PtG, no pause. All completed
> 6) 1000 PtG, no pause. 980 completed. 20 GridFTP error or timeout
> 7) 5000 PtG, sleep 5 seconds for every 200 PtG, 4947 
> completed. 53 GridFTP error or timeout
> 
> Note: at any given time during 6) and 7), 'netstat...' showed 
> no more than 15 TCP connections. 
> Note: I think GridFTP error or timeout can be ignored because 
> they happened at a stage after the glite-url-copy has 
> finished PtG exchange with BeStMan-xrootd.
> 
> --
> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
> 



Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

September 2016
July 2016
June 2016
May 2016
April 2016
March 2016
November 2015
September 2015
July 2015
June 2015
May 2015
April 2015
February 2015
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
September 2013
August 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use