Hi,
Here a summary what is and what is not working using today's xrootd cvs
version for the client code, and version 20041118-0948_dbg for the
servers at SLAC (restarted this afternoon).
With a given server (bbrprod0X)
===============================
OK:
- xrdcp including creation of non existing directories and file
permissions (thanks Fabrizio)
- checksum using XrdClientAdmin::XrdGetChecksum, buffer overrun is
fixed (thanks Fabrizio)
CAVEAT: timeout of connection while checksum is calculated.
http://www.slac.stanford.edu/cgi-bin/lwgate/XROOTD-L/archives/xrootd-
l.200411/Subject/article-88.html
This will likely become an issue once we start production with 100s
of jobs
- changing permissions and removing files
- listing directories (thanks Wilko)
Not working:
- removing directories
With the load balancer (oprserv08)
==================================
- reading collections with standard access methods works *except* for
files written in the last 8(?) hours when the load balancer was asked
for the file before it did exist. This collection was copied on
bbrprod01 and KanCollUtil finds it:
KanCollUtil
root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/
23/BCharmoniumToHad_2379
root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/
23/BCharmoniumToHad_2379 (48609 events)
but asking the load balancer does not return the file:
KanCollUtil
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/
23/BCharmoniumToHad_2379
2004-11-23 16:23:22 7642 Err : TXNetConn::Open - Server
[oprserv08.slac.stanford.edu:1094] did not return OK message for last
request.
2004-11-23 16:23:22 7642 Err : TXNetConn::SendGenCommand - Server
declared error 3005: 'No servers are available to read the file.'
2004-11-23 16:23:22 7642 Err : TXNetFile::CreateTXNf - Error
opening the file
//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/23/
BCharmoniumToHad_2379.01.root on host oprserv08.slac.stanford.edu:1094
ERR Could not open a file expected to contain the event header:
ERR LFN =
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/
23/BCharmoniumToHad_2379.01.root
ERR PFN =
root://oprserv08:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/
23/BCharmoniumToHad_2379.01.root
ERR Check collection name and access method...
Not working at all:
- XrdClientAdmin::XrdExistFiles, XrdClientAdmin::XrdDirList and
XrdClientAdmin::XrdGetChecksum just look on the load balancer itself,
i.e. they do not find anything by definition
- xrdcp, XrdClientAdmin::XrdChmod and XrdClientAdmin::XrdRm hang:
041123 16:36:20 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:25 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:30 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:35 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:40 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
041123 16:36:45 001 Xrd: SendGenCommand Server
[oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
In summary we are close to get the production going if we talk directly
to individual servers. This is clearly less than we wished for. Load
balancing is done on a random basis, not taking server load and disk
space into consideration. Finding files, getting checksums and removing
files requires asking all servers. This increases the overall load on
the system.
In general a lot of functionality is implemented on the client side
which IMHO should be provided by the server: For example xrdcp issues
for each directory a mkdir command, instead just asking the server to
create the whole path. As currently no wildcard operations work, we
need to get a list of files from the server and then ask the server to
remove each file individually.
Cheers,
Remi
---------------------------------------------------------------------
Q: How many kinds of physicists are there?
A: Three. Those who can count and those who can't.
*********************************************************************
Remigius K. Mommsen e-mail: [log in to unmask]
University of California, Irvine URL: http://cern.ch/mommsen
c/o SLAC voice: ++1 (650) 926-3595
2575 Sand Hill Road #35 fax: ++1 (650) 926-3882
Menlo Park, CA 94025, US home: ++1 (650) 233-9041
*********************************************************************
|