Print

Print


Thanks Max,

I never looked at the code that computes the path and name of the "real" file 
(when use XA spaces) and wasn't sure that this was purely dependent on the 
path/name of the file within the namespace and would agree between dataservers.

In our systems, the extended attributes for a file only contain the checksum and 
the PFN, but not the space name.

I will investigate using rsync since it does preserve the metadata, but there 
may be an issue with our space reporting that relies on oss.usage on each 
dataserver.  I know that I can fix it periodically with frm_admin, but there may 
be other issues in the overall system if the reporting gets strays from the real 
values.

Regards,

Patrick




On 03/02/2018 02:39 AM, Fischer, Max (SCC) wrote:
> Hi,
> 
> if you have access to the servers directly, I strongly recommend using rsync to do the copying. This allows you to preserve file metadata and even namespace if the layout is the same. We recently transferred about 4 PB this way.
> 
> Concerning the space names: XRootd stores all such metadata in the file's extended attributes. You can query this using the getfattr command line utility, or for example the python3 os.getxattr function. Make sure to reconstruct this using the same toolset on the destination.
> 
> Cheers,
> Max
> 
>> Am 02.03.2018 um 09:10 schrieb Patrick McGuigan <[log in to unmask]>:
>>
>> Thanks Wei,
>>
>> I will look at implementing a service or the copy script can build a script that I can push to the destination server to a mass update.
>>
>>
>> I did find an awkward way to determine the space name associated with a file:
>>
>> # xrdfs localhost query xattr /xrd/atlasproddisk/rucio/mc15_13TeV/cc/b9/EVNT.06212746._084813.pool.root.1
>> oss.cgroup=ATLASPRODDISK&oss.type=f&oss.used=3045780&oss.mt=1519975567&oss.ct=1519976072&oss.at=1519975567&oss.u=*&oss.g=*&oss.fs=w&ofs.ap=a
>>
>> I'll have to investigate if using xrdfs is worth the effort.
>>
>> Thanks again,
>>
>> Patrick
>>
>>
>>
>>
>> On 03/02/2018 01:31 AM, Yang, Wei wrote:
>>> Hi Patrick,
>>>
>>> I donĀ¹t know the answer of the 1st question. For the 2nd question, there is no way in xrootd to do this. So I use a separate help service on the data servers:
>>>
>>> So if your copy script do something like "echo filename mtime atime | nc dataserver 5151", then this script will help you set the mtime.
>>>
>>> #!/bin/sh
>>>
>>> # xinetd service
>>> #service setmtime
>>> #{
>>> #    type        = UNLISTED
>>> #    socket_type = stream
>>> #    protocol    = tcp
>>> #    wait        = no
>>> #    user        = atldq2
>>> #    server      = /u/at/atldq2/bin/setmtime.sh
>>> #    port        = 5151
>>> #    log_on_failure  = HOST ATTEMPT
>>> #    log_on_success  =
>>> #    disable     = no
>>> #}
>>>
>>> exec > /dev/null 2>&1
>>> read line
>>> set $line
>>> [ $# -ne 2 -a $# -ne 3 ] && exit
>>> [ ! -f $1 ] && exit
>>> if echo $2 | egrep -q '^[0-9]+$'; then  # set mtime
>>>      touch -m -c -d @$2 $1
>>> fi
>>> [ $# -eq 2 ] && exit
>>> if echo $3 | egrep -q '^[0-9]+$'; then  # set atime
>>>      touch -a -c -d @$3 $1
>>> fi
>>>
>>>
>>>
>>>
>>> --
>>> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: <[log in to unmask]> on behalf of Patrick McGuigan <[log in to unmask]>
>>> Date: Friday, March 2, 2018 at 2:00 AM
>>> To: xrootd-l <[log in to unmask]>
>>> Subject: Mass data movement between dataservers
>>>
>>>> Hi,
>>>>
>>>> I need to move large numbers of files between dataservers.  In one case this is
>>>> because I want to retire an existing host and, in another case, I need to free
>>>> up space on data servers so that overall usage among the dataservers is more
>>>> balanced.
>>>>
>>>> I can do this with existing scripts that I have used in the past, but there are
>>>> places I would like to improve things.
>>>>
>>>> The general operation is to:
>>>> 1) create a list of files to be moved and a list of dataservers that will accept
>>>> the copies.
>>>>
>>>> 2) Determine the space name associated with each file, so that the destination
>>>> maintains the space name.
>>>>
>>>> 3) In multiple threads copy the file with space information, verify the local
>>>> and remote checksums, delete the local file.  I normally kick off one process
>>>> per destination host.
>>>>
>>>>
>>>> I utilize a mapping function that looks at the path of a local file to determine
>>>> what space name it should belong to.  I am wondering if I can ask the dataserver
>>>> holding the file, what the space name is.  Is this possible, or if I can make an
>>>> xrdcp replicate the opaque information.
>>>>
>>>>
>>>> Another place I would like to improve things is to maintain the timestamps
>>>> associated with the original file, so that the modtime of the copied file
>>>> matches the modtime of the original file.  Is there there a way of doing the
>>>> equivalent to:
>>>>
>>>> touch -mt <STAMP> path
>>>>
>>>> on a file stored in xrootd?
>>>>
>>>>
>>>>
>>>> Should I be using FRM to manage all of this?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Patrick
>>>>
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
> 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1