hello Pete, Andy, and all,
at the moment, it is a read only system.
so the user recreated his file and migrate it to HPSS using the usual
rfio command "rfcp".
it can be considered as a backdoor and it is a special case (I did not
think I would encounter that one) as usually the files are writen once /
read many times, so the problem I mentionned is usually avoided in the
HEP world but that can happen with user or privately produced files. I
understand it will certainly rarely seen, but it could happen.
I think this issue can be encountered even if the writing is made within
the xrootd service (as opposed to the case where a "backdoor" was used).
There would be 2 cases:
a) 1 pool allowing r/w: assuming one would like to update a file foo
(initial version: v1), we can encounter the case where there are 2
duplicates of foo_v1.root on 2 different servers s1 and s2. Then, let's
assume that s1:foo_v1.root is being updated. We will then have on disk:
s1:foo_v2.root
s2:foo_v1.root
and in HPSS, foo_v2.root will superceed foo_v1.root after migration.
but then you have some inconsistency in the cache.
So if someone who wants to read the latest version of foo can be
directed to s2:foo_v1.root. That is not what we want and this is also
why I don't really like the idea of mixing read and write in a single
pool. But a way to avoid the situation would be to use some checking to
the MSS core server as I proposed. I understand that when the MSS is
down all the xrootd could be down a priori even if you want to access
files already on disk: well in the case of a MSS failure, the checking
operation would return an error and in that case, one could resume
activity with the copy on disk.
b) 1 pool for read access + 1 for writing:
clearly, if you got updates of files in the writing pool, then the read
pool would have its cache not up-to-date and you will have to find a
mechanism to update it. It could be done using xrdcp.
cheers,
JY
Peter Elmer wrote:
> Hi Jean-Yves,
>
> I didn't realize that you were offering a read/write service to the D0
>people, is that really correct?
>
> As you know we only recently (last fall) brought up the read/write output
>buffer for skimming (which isn't backed by MSS, but a temp buffer), and
>Wilko and I are just now planning to bring up the /store/users read/write
>system (backed by MSS) at SLAC in the next weeks. That was supposed to have
>the stat check to MSS when a new file was being opened for write (only) to
>verify that it does not already exist (i.e. this is the oss.check config
>directive, I think).
>
> As Andy has pointed out, it was not foreseen to allow _updates_ to files
>via any other mechanism than via xrootd itself (if that is what is going
>on). While a mechanism such as the one you suggest could be added to cover
>such "somebody came in through the backdoor to update the file", it seems to
>me like it would it would be difficult and costly to do it properly and
>generally.
>
> What alternate door did the user use to update the file in HPSS?
>
> Pete
>
>On Wed, Mar 02, 2005 at 08:52:33PM +0100, Jean-Yves Nief wrote:
>
>
>> one of the D0 user accessing files via xrootd encountered the
>>following issue: after having accessed a file via xrootd (so after being
>>staged from the "master" copy stored in HPSS), he modified the master
>>copy in HPSS and wanted to access the modified file via xrootd: but as
>>the old version of the file was already on the disk cache, no staging
>>occured of course (but that is the expected behavior obviously) and he
>>grabbed the old one version, which is not what he wanted. Well as an
>>emergency solution and as it was the first time it happened, I've
>>deleted the old version on the disk cache so he could proceed.
>>However, I think it would be nice to have some control on the validity
>>of the cache: one solution would be to add the following test: in case
>>the file is already in the cache, compare the creation time on the cache
>>disk (t1) with the last modified time of the file stored in HPSS (t2):
>>if t1<t2 then restage the file.
>>it will be a very small overhead to the mechanism, each time a file is
>>accessed: it have just to issue a "statx" request to the MSS.
>>or maybe there is a more simple solution.
>>cheers,
>>JY
>>
>>
>
>
>
>-------------------------------------------------------------------------
>Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644
>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
>-------------------------------------------------------------------------
>
>
|