Print

Print


Hi,

Here a summary what is and what is not working using today's xrootd cvs  
version for the client code, and version 20041118-0948_dbg for the  
servers at SLAC (restarted this afternoon).


With a given server (bbrprod0X)
===============================

OK:
- xrdcp including creation of non existing directories and file  
permissions (thanks Fabrizio)

- checksum using XrdClientAdmin::XrdGetChecksum, buffer overrun is  
fixed (thanks Fabrizio)

   CAVEAT: timeout of connection while checksum is calculated.
    
http://www.slac.stanford.edu/cgi-bin/lwgate/XROOTD-L/archives/xrootd- 
l.200411/Subject/article-88.html
   This will likely become an issue once we start production with 100s  
of jobs

- changing permissions and removing files

- listing directories (thanks Wilko)

Not working:
- removing directories



With the load balancer (oprserv08)
==================================

- reading collections with standard access methods works *except* for  
files written in the last 8(?) hours when the load balancer was asked  
for the file before it did exist. This collection was copied on  
bbrprod01 and KanCollUtil finds it:

KanCollUtil  
root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
23/BCharmoniumToHad_2379
root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
23/BCharmoniumToHad_2379 (48609 events)


but asking the load balancer does not return the file:

KanCollUtil  
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
23/BCharmoniumToHad_2379
2004-11-23 16:23:22 7642 Err : TXNetConn::Open                - Server  
[oprserv08.slac.stanford.edu:1094] did not return OK message for last  
request.
2004-11-23 16:23:22 7642 Err : TXNetConn::SendGenCommand      - Server  
declared error 3005: 'No servers are available to read the file.'
2004-11-23 16:23:22 7642 Err : TXNetFile::CreateTXNf          - Error  
opening the file  
//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/23/ 
BCharmoniumToHad_2379.01.root on host oprserv08.slac.stanford.edu:1094
ERR Could not open a file expected to contain the event header:
ERR    LFN =  
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
23/BCharmoniumToHad_2379.01.root
ERR    PFN =  
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
23/BCharmoniumToHad_2379.01.root
ERR Check collection name and access method...




Not working at all:
- XrdClientAdmin::XrdExistFiles, XrdClientAdmin::XrdDirList and  
XrdClientAdmin::XrdGetChecksum just look on the load balancer itself,  
i.e. they do not find anything by definition

- xrdcp, XrdClientAdmin::XrdChmod and XrdClientAdmin::XrdRm hang:
041123 16:36:20 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:25 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:30 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:35 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:40 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:45 001 Xrd: SendGenCommand Server  
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait



In summary we are close to get the production going if we talk directly  
to individual servers. This is clearly less than we wished for. Load  
balancing is done on a random basis, not taking server load and disk  
space into consideration. Finding files, getting checksums and removing  
files requires asking all servers. This increases the overall load on  
the system.

In general a lot of functionality is implemented on the client side  
which IMHO should be provided by the server: For example xrdcp issues  
for each directory a mkdir command, instead just asking the server to  
create the whole path. As currently no wildcard operations work, we  
need to get a list of files from the server and then ask the server to  
remove each file individually.

Cheers,
		Remi


---------------------------------------------------------------------
Q: How many kinds of physicists are there?
A: Three. Those who can count and those who can't.

*********************************************************************
Remigius K. Mommsen                 e-mail: [log in to unmask]
University of California, Irvine       URL:    http://cern.ch/mommsen
c/o SLAC                             voice:        ++1 (650) 926-3595
2575 Sand Hill Road #35                fax:        ++1 (650) 926-3882
Menlo Park, CA 94025, US              home:        ++1 (650) 233-9041
*********************************************************************