Print

Print


Hi Fabrizio and Roberto,


Fabrizio Furano wrote:
> Hi Pavel,
>
>  thank you very much for the info. That is very useful for me. However 
> I think that there is a version-based misunderstanding mixed to some 
> bugs in the most recent ones. It looks like that in the old prod 
> version there was no xrootd.prep directive, that's the point of 
> confusion.
>
>  In the meantime I reverted back to my local machine (and a simpler 
> config file) to do this kind of tests. I believe I found a weirdness 
> in the config error as reported. Here it is:
>
> - no xrootd.prep directive ---> prepare is off (ok!)
> - xrootd.prep logdir /tmp/prestage_log ---> the server refuses to start
Unfortunately, we don't use this directive, since we are not trying to 
track prepare requests on the cluster.
If I do not specify directive in my version cvs checkout on 20070428, I 
will just have a message 'xrootd.prepare logdir' not specified; prepare 
tracking disabled.
This doesn't mean that prepare is disabled at all. You are just not 
tracking requests.
>
> Step 1:
>  Imho there is some bug in the config parsing, because having a look 
> at the code and at the strace log, we see that xrootd creates the dir 
> /tmp/prestage_log, and then looks for /tmp/prestage_log/anon, which 
> was not created. Hence it exits with an error. This is very confusing. 
> Andy, what do you think?
>
> 070830 14:22:40 001 XrootdConfig: Unable to process logdir 
> /tmp/prestage_log; no such file or directory
> =====> xrootd.prep logdir /tmp/prestage_log
>
>
> Step 2:
>  I create the dir /tmp/prestage_log/anon, and now the server starts, 
> apparently fine.
Hum, there should be a MkDir somewhere, as you see anon is the default 
instance name, if you do not specify none. So, there is a bug somewhere.
>
> Step 3:
>  I connect with the xrd app and request a prepare. The server answers 
> ok, and nothing happens... This is the request sent by the client:
>
>
> ================= DUMPING CLIENT REQUEST HEADER =================
>                 ClientHeader.streamid = 0x01 0x00
>                ClientHeader.requestid = kXR_prepare (3021)
>          ClientHeader.prepare.options = 0x08
>             ClientHeader.prepare.prty = 0x00
>         ClientHeader.prepare.reserved = 0 repeated 14 times
>              ClientHeader.header.dlen = 5
> =================== END CLIENT HEADER DUMPING ===================
>
>
>  And this is what the server says in the log (olbd says nothing):
>
> 070830 14:38:29 2295 furano.2318:[log in to unmask] 
> XrootdProtocol: 0100 req=3021 dlen=5
> 070830 14:38:29 2295 furano.2318:[log in to unmask] 
> XrootdResponse: 0100 sending 24 data bytes; status=0
> 070830 14:38:32 2295 furano.2318:[log in to unmask] 
> XrootdProtocol: 0100 request timeout; read 0 of 24 bytes
> 070830 14:38:32 2295 furano.2318:[log in to unmask] XrdPoll: 
> sending poller 0 enable for link 15
> 070830 14:38:32 2295 XrdPoll: Poller 0 enabled fd 15 entry 1 now at 2
> 070830 14:38:54 2295 XrdSched: running 
> furano.2318:[log in to unmask] inq=0
> 070830 14:38:54 2295 XrootdXeq: furano.2318:[log in to unmask] 
> disc 0:00:44
>
>
> Step 4:
>  If I lurk inside /tmp/prestage_log/anon I see that the server created 
> some files with complicated names, containing the filename I requested 
> to stage. Now I am sure I am missing something, because still nothing 
> happens. Instead the normal staging works. What's missing here? Andy, 
> Pavel, do you have any suggestion? I attach my config file to the email.
Could you please paste which files are there ? If the prepare was 
successful, then you should see a file request located in following file.

/tmp/pre_stage.queue.0. 


The way how it should work is that xrootd on redirector node pass its 
prepare request to olbd and olbd keeps track of servers which are able 
to prepare files (you are saying it by olb.prep directive). When the 
olbd on redirector node chooses best node to serve prepare request, the 
request is passed trough the olbd to that server.
The chosen server will initiate add command on its forked olb prep 
command and the particular prep command should add this file into its 
queue. (this is the /tmp/pre_stage.queue.0)

Now it is a time to bring files from your tape system. So, the way how 
it is done is that you need to run on each dataserver executable called 
mps_PreStage. This executable needs to be started along with the xrootd 
and olbd as a separate process. This executable is looking into this 
/tmp/pres_stage.queue.0  and it is grabbing files one by one and 
initiate this command in you configuration mps.pstg.pstgcmd.
Don't forget that you need to start mps_PreStage command with -c option 
to pass the configuration file.

Could you please enable

olb.trace all debug
odc.trace all debug

in your configuration files ? So, we can see if the request is passed to 
olb ?

Thanks
I hope I didn't confuse you
>
> Thank you!
> Fabrizio
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> #
> # $Id: xrootd.cf.example,v 1.2 2004/08/17 15:02:53 elmer Exp $
> #
> # The following is a sample xrootd configuration. The relevant prefixes are:
> #
> # acc    - Access Control
> # odc    - Open Distributed Cache
> # ofs    - Open File    System
> # oss    - Open Storage System
> # sec    - Security
> # xrd    - Extended Request Daemon
> # xrootd - Xrootd Service
>
>
> #-----------------------------------------------
>
> # The Open Distributed Cache Section
> #
>
> # Tell xrootd who is the redirection manager
> #
> #odc.manager kanolb-a+ 3121
>
> #-----------------------------------------------
>
> # The Open File System Section
> #
>
> # Identify which machines will be redirecting clients
> #
> #ofs.redirect  remote if kanolb-a+
>
> # Identify which machines clients will be redirected to (need not be inclusive)
> #
> #ofs.redirect  target if kan0*
>
> #-----------------------------------------------
>
> # The Open Storage System Section
> #
>
> # Indicate where we hace mounted filesystems that can be used for space
> # The cache directive will also be used by the olb. So, we need not repeat it.
> #
> oss.cache public /tmp/cache*
>
> # While we highly recommend that you avoid path prefixing, here we indicate
> # the actual way files are physically name (i.e., the lfn to pfn mapping). The
> # localroot is how we name this on the data server while remoteroot tells the
> # system the corresponding name in the Mass Storage System.
> #
> oss.localroot /tmp/kanga
> oss.remoteroot /tmp/kanga_mss
>
> # Since we are using a Mass Storage System we need to indicate how MSS meta-data
> # information is retrieved (mssgwcmd) and how files are retrieved (stagecmd).
> # The xfr directive will limit the sumber of simultaneous stages to eight.
> #
> oss.mssgwcmd /home/furano/xrdtestmss/mymssgwcmd.pl
> oss.stagecmd /home/furano/xrootd/utils/mps_Stage -d -c /home/furano/xrdtestmss/xrootd.cf.fabrizio
> oss.xfr 8
>
> # Finally, we need to indicate, by path, which ones are migratable. This will
> # indicate which logical paths (i.e., lfn's) should exist in the MSS.
> #
> oss.path / nodread mig
>
> #-----------------------------------------------
>
> # The XRD Section
> #
> # Generally, xrd defaults are fine. So, no need to change them.
> #
>
> #-----------------------------------------------
>
> # The Xrootd Section
> #
> # Here we load the extended file system support for xrootd
> #
> xrootd.fslib /home/furano/xrootd/lib/arch_dbg/libXrdOfs.so
> xrootd.export /
>
> xrootd.prep logdir /tmp/prestage_log
>
>
> #-----------------------------------------------
>
> #-----------------------------------------------
>
> # The Open Load Balancer Section
> #
> # For managers, use the allow, port, and sched, and wait directives.
> # For Servers,  use the path, perf, port, and subscribe directives.
> #
> # Indicate which hosts are allowed to connect to the olb (even if localhost)
> #
> #olb.allow host kan*.slac.stanford.edu
> #olb.allow host bbr-olb*.slac.stanford.edu
>
> # Tell the olb what port to use (server and manager)
> #
> olb.port 3121
>
> # To use load based scheduling, specify a load formula using sched
> #
> olb.sched cpu 100 
>
> # Tell each data server what paths to export
> #
> olb.path s /
>
> # Tell each data server who to ubscrive to (i.e., the managers)
> #
> #olb.subscribe kanolb-a+ 3121
>
> # Tell each olb server to wait for the corresponding xrootd server
> #
> olb.wait
>
> # To effect load based scheduling, we must start a performance monitor
> #
> #olb.perf int 180 pgm /opt/olbd/bin/cur/XrdOlbMonPerf 120
>
> olb.prepmsg + $RID $NOTIFY $PRTY $OPTS $PFN
> olb.prep echo ifpgm /home/furano/xrootd/utils/mps_prep -c /home/furano/xrdtestmss/xrootd.cf.fabrizio
>
> # tell to the prestage cmd the location of the file which holds pre-staging queue (used by mps_prepare and mps_prestage program)
> mps.pstg.prtyfn /tmp/pre_stage.queue
> mps.pstg.logfn /tmp/prestage_log
>
>
> #-----------------------------------------------
>
> #-----------------------------------------------
>
> # The MPS Section
> #
> # Here code the relevant dorectives to control migration, purge, and staging
> # Most of the defaults are likely wrong. The typical ones to specify are:
> #
> # Where error messages go via mail and how often they should go
> #
> mps.adminuser = furano
>
> mps.msscmd = /home/furano/xrdtestmss/mymssgwcmd.sh
> mps.mssstat = stat
> mps.xfrcmd = /home/furano/xrdtestmss/mymssxfr.sh %sfn %tfn
>
>
> # For migration, how long to wait between runs, how long a failure is to be
> # recognized (after which the operation is retried). how often purge should
> # run, and the high/low purge thesholds.
> #
> mps.migr.rmfail_time = 129600
> mps.migr.waittime = 600
> mps.purg.waittime = 3600
> mps.purg.lousedpct = 80
> mps.purg.hiusedpct = 80
>
> # For pre-stage, we generally want to indicate the maximum number allowed
> # at one time and the command to use to transfer data from the mss
> #
> mps.pstg.max_pstg_proc = 3
> mps.pstg.pstgcmd = /home/furano/xrootd/utils/mps_Stage -i  -d -c /home/furano/xrdtestmss/xrootd.cf.fabrizio
>
> # Generally, for all component, we need to indicate who the MSS transfer user
> # is, the target host and port. Usually, we limit retries to two.
> #
> mps.xfrhost = pcbabar9
> mps.xfruser = furano
> mps.xfrport = 2021
> mps.stage.max_retry = 2
> mps.debug = 4
> #-----------------------------------------------
>
>