Print

Print


  Hi Jean-Yves,

On Thu, Mar 03, 2005 at 11:02:04AM +0100, Jean-Yves Nief wrote:
>          at the moment, it is a read only system.
> so the user recreated his file and migrate it to HPSS using the usual 
> rfio command "rfcp".
> it can be considered as a backdoor and it is a special case (I did not 
> think I would encounter that one) as usually the files are writen once / 
> read many times, so the problem I mentionned is usually avoided in the 
> HEP world but that can happen with user or privately produced files. I 
> understand it will certainly rarely seen, but it could happen.

  Yes, this sounds like an artifact of using simultaneously two different 
systems with different notions of how things should work...

> I think this issue can be encountered even if the writing is made within 
> the xrootd service (as opposed to the case where a "backdoor" was used). 
> There would be 2 cases:
> 
> a) 1 pool allowing r/w: assuming one would like to update a file foo 
> (initial version: v1), we can encounter the case where there are 2 
> duplicates of foo_v1.root on 2 different servers s1 and s2. Then, let's 
> assume that s1:foo_v1.root is being updated. We will then have on disk:
> s1:foo_v2.root
> s2:foo_v1.root
> and in HPSS, foo_v2.root will superceed foo_v1.root after migration.
> but then you have some inconsistency in the cache.
> So if someone who wants to read the latest version of foo can be 
> directed to s2:foo_v1.root. That is not what we want and this is also 
> why I don't really like the idea of mixing read and write in a single 
> pool. But a way to avoid the situation would be to use some checking to 
> the MSS core server as I proposed. I understand that when the MSS is 
> down all the xrootd could be down a priori even if you want to access 
> files already on disk: well in the case of a MSS failure, the checking 
> operation would return an error and in that case, one could resume 
> activity with the copy on disk.

  This was the idea behind the "oss.check" directive. As I understood it,
it would basically disallow write/update access once the file has been 
migrated to the MSS. This is what we were intending to use for precisely 
this purpose in the /store/users r/w space. Note that the check would have to 
happen when files are opened for _write_, not read, which is again the 
economy we want to achieve. 

  Wilko and I intend set up the /store/users area up at SLAC after the
new production xrootd version is running everywhere next week. Note that we 
don't expect to do _automatic_ migration from MSS from this area. The idea 
is that a user will have to run some command to trigger the stage out. (In 
this case in BaBar we will give users some command that puts the information 
into the dataset bookkeeping and wraps the lower level command that actually 
triggers the stageout.) When this command is issued, the file would have to 
be removed from the "write" pool so it cannot be further updated. It can 
be moved to a r/o pool as an optimization or just staged back from MSS to 
the r/o machines when needed. The write machine is not allowed to stage files 
in from the MSS, hence the file can no longer be updated. Thus when a user 
runs this command they are basically saying "this is the final version of 
this file, from now on it is read-only". And the r/o system stays simple and 
scalable.

  Does this seem reasonable? Wilko/Andy, did I miss anything? 

  (Unless Andy or you have a surprise for me, Artem and the pud daemon will 
perhaps get their revenge here as there is currently no mechanism I can
see to trigger the stage out via xrootd itself... ;-) Presumably the mps
scripts have to be used directly until such a general mechanism is provided.

> b) 1 pool for read access + 1 for writing:
> clearly, if you got updates of files in the writing pool, then the read 
> pool would have its cache not up-to-date and you will have to find a 
> mechanism to update it. It could be done using xrdcp.

  I think that a combined r/w + ro pool can be done as described above, with
the caveat that a command would have to be run explicitly to trigger a
stage-out. This is probably sufficient for the user areas. (True production
areas could of course have automated migration to MSS as they like as long
they are willing to say "once I drop the file in it can be considered r/o".)

  Comments? (From anyone.)

                                   Pete

-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------