Print

Print


  Hi Remi,

On Tue, Nov 23, 2004 at 08:28:59PM -0800, Remi Mommsen wrote:
> Not working:
> - removing directories

  For the moment this doesn't work. I'll add it to the known problem list,
but it isn't considered critical.

> With the load balancer (oprserv08)
> ==================================
> 
> - reading collections with standard access methods works *except* for  
> files written in the last 8(?) hours when the load balancer was asked  
> for the file before it did exist. This collection was copied on  
> bbrprod01 and KanCollUtil finds it:
> 
> KanCollUtil  
> root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
> 23/BCharmoniumToHad_2379
> root://bbrprod01:1094//prod/store/SPskims/R14/16.0.1a/BCharmoniumToHad/ 
> 23/BCharmoniumToHad_2379 (48609 events)
> 
> but asking the load balancer does not return the file:
<....>

  This is presumably because you are xrdcp-ing the file directly to an
individual machine. When you switch to copying things in via the load
balancer itself, this should not be a problem. Why aren't you copying
things into the buffer via the load blanacer? (i.e. what is the last
outstanding problem for doing that?)

> Not working at all:
> - XrdClientAdmin::XrdExistFiles, XrdClientAdmin::XrdDirList and  
> XrdClientAdmin::XrdGetChecksum just look on the load balancer itself,  
> i.e. they do not find anything by definition

  Andy committed a fix for this (server side) last night. I'll make a
version later today which can be started.

> - xrdcp, XrdClientAdmin::XrdChmod and XrdClientAdmin::XrdRm hang:
> 041123 16:36:20 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
> 041123 16:36:25 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
> 041123 16:36:30 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
> 041123 16:36:35 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
> 041123 16:36:40 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait
> 041123 16:36:45 001 Xrd: SendGenCommand Server  
> [oprserv08.slac.stanford.edu:1094] requested 5 seconds of wait

  It sounds like this one should also be fixed with the new server version.

> In summary we are close to get the production going if we talk directly  
> to individual servers. This is clearly less than we wished for. Load  
> balancing is done on a random basis, not taking server load and disk  
> space into consideration. Finding files, getting checksums and removing  
> files requires asking all servers. This increases the overall load on  
> the system.

  Let's see if the latest round of fixes allow you to do everything via
the load balancer...

> In general a lot of functionality is implemented on the client side  
> which IMHO should be provided by the server: For example xrdcp issues  
> for each directory a mkdir command, instead just asking the server to  
> create the whole path. 

  There are two things here:

   o The "mkdir -p" type functionality. This isn't there, but Andy says he
     will add it.

   o The fact that the mkdir has to be done at all (one could just have that
     happen automatically any time a file is opened for write if some bit
     is set). Andy also will this possibility, too.

> As currently no wildcard operations work, we  
> need to get a list of files from the server and then ask the server to  
> remove each file individually.

  This shouldn't be a big deal. Actually I don't understand why you
need to ask the server for the list of files. Don't you already know that
as part of the skim bookkeeping anyway? Or do you mean: we need to get
the list of servers the files are on since rm wasn't working via the load
balancer and then delete them manually? (That shouldn't be a problem now
as mentioned above.)

                                   Pete

-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------