Print

Print


I just found one file where problem with offset occurred in the middle
of transfer (not at the beginning):

190907 20:03:53 25211 XrootdXeq: atlasprd.2817:168@[2001:718:401:6e03::1:51] pub IP64 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
190907 20:03:55 25211 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@0 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:03:55 25211 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@262144 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:03:55 25211 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@524288 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:03:55 25211 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@786432 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:03:55 25211 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@1048576 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
...
190907 20:40:35 31589 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2952003584 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:40:53 31589 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2952265728 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:41:55 31609 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2952265728 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
*190907 20:42:05 25089 atlasprd.2817:168@[2001:718:401:6e03::1:51]
ofs_write: 1919238960@7379557499097268579
fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload****190907 20:42:13 25118 atlasprd.2817:168@[2001:718:401:6e03::1:51]
ofs_write: 1919238960@7379557499097268579
fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload***190907 20:42:13 25118 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2953314304 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:42:13 25118 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2953576448 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload
190907 20:42:20 25118 atlasprd.2817:168@[2001:718:401:6e03::1:51] ofs_write: 262144@2953576448 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/6a/AOD.18892557._000593.pool.root.1.rucio.upload

Whole xrootd.log for this transfer available at
http://novastore.farm.particle.cz/tmp/xrootd_bigfile_wrong_offset_in_the_middle_of_transfer.txt

Petr

On 9/8/19 10:25 AM, Petr Vokac wrote:
> I executed basically following python code that from bash script in a loop
> import gfal2
> ctx = gfal2.creat_context()
> ctx.set_opt_string("XROOTD PLUGIN", "XRD.WANTPROT", "gsi,unix")
> params = ctx.transfer_parameters()
> params.timeout = 60
> ctx.filecopy(params, "file:///tmp/1M", "root://golias100.farm.particle.cz/dpm/farm.particle.cz/home/atlas/1M")
> and after a while new corrupted files appeared at our storage
> including one big file:
> [vokac@ui1 ~]$ gfal-ls -l 'davs://golias100.farm.particle.cz/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000'
> -rw-rw-r--   0 1310  5281  139703410778112 Sep  7 19:01 davs://golias100.farm.particle.cz/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000	
> Now I have also debug output from gfal client including
> XRD_LOGLEVEL=Dump + XRD_LOGMASK=All for this file with xrootd.log on
> server side:
> [root@dpmpool19 ~]$ grep condor.39253:171 /var/log/xrootd/dpmdisk/xrootd.log
> 190907 19:01:19 29645 XrootdXeq: condor.39253:171@[2001:718:401:6e03::1:82] pub IP64 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vokac/CN=610071/CN=Petr Vokac
> 190907 19:01:22 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 1009440128@139703410643808 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:23 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@131072 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:23 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@262144 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:24 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@393216 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:24 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@524288 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:25 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@655360 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:25 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@786432 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:25 32625 condor.39253:171@[2001:718:401:6e03::1:82] ofs_write: 131072@917504 fn=/dpm/farm.particle.cz/home/atlas/1M.01556.20190907190118.646902567.00000
> 190907 19:01:26 32625 XrootdXeq: condor.39253:171@[2001:718:401:6e03::1:82] disc 0:00:07
> This file was transferred from mff82.farm.particle.cz
> (2001:718:401:6e03::1:82) to the storage dpmpool19.farm.particle.cz
> (2001:718:401:6017:2::19) and both machines use
> xrootd-4.9.1-1.el7.x86_64 and mff82 use gfal2-2.16.3-1.el7.x86_64
>
> I also tried to calculate adler32 for data blocks from positions
> logged by server and result match source data checksum. When I look at
> other transfers finished with "SUCCESS" but with the filesize
> different from source file I can see that problems always occurs only
> with first "ofs_write" log line where size@offset contains wrong numbers.
>
> Attached files:
>
> 1M.01556.20190907190118.646902567.00000.dump ... xrd with dump log level
> 1M.01556.20190907190118.646902567.00000.log ... gfal2 verbose logging
> (+ few log lines from my script)
> 1M.01556.20190907190118.646902567.00000.xrootd.long ... dpmpool19
> xrootd.log
>
> I have more examples of successfully transferred files with wrong size
> (sometimes bigger than original 1M size and sometimes smaller):
>
> 07 Sep 18:39:33.879 [ERROR] dpmpool1.farm.particle.cz BAD SIZE 917504
> 07 Sep 18:40:45.039 [ERROR] dpmpool21.farm.particle.cz BAD SIZE 917504
> 07 Sep 18:41:22.555 [ERROR] dpmpool21.farm.particle.cz BAD SIZE 917504
> 07 Sep 18:41:39.248 [ERROR] dpmpool21.farm.particle.cz BAD SIZE 917504
> 07 Sep 18:56:51.113 [ERROR] dpmpool24.farm.particle.cz BAD SIZE 917504
> 07 Sep 18:58:04.436 [ERROR] dpmpool1.farm.particle.cz BAD SIZE 393194936
> 07 Sep 19:01:25.908 [ERROR] dpmpool19.farm.particle.cz BAD SIZE
> 139703410778112
> 07 Sep 19:02:17.183 [ERROR] dpmpool24.farm.particle.cz BAD SIZE 0
> 07 Sep 19:02:53.014 [ERROR] dpmpool16.farm.particle.cz BAD SIZE 917504
> 07 Sep 19:03:49.424 [ERROR] dpmpool1.farm.particle.cz BAD SIZE 917504
> 07 Sep 19:05:47.743 [ERROR] dpmpool21.farm.particle.cz BAD SIZE 13983727
> 07 Sep 19:15:15.761 [ERROR] dpmpool22.farm.particle.cz BAD SIZE 12752345
> 07 Sep 19:16:58.898 [ERROR] dpmpool20.farm.particle.cz BAD SIZE 917504
> 07 Sep 19:18:51.590 [ERROR] dpmpool17.farm.particle.cz BAD SIZE 6302080
> 07 Sep 19:20:37.006 [ERROR] dpmpool21.farm.particle.cz BAD SIZE 1456153896
> 07 Sep 19:52:32.553 [ERROR] dpmpool16.farm.particle.cz BAD SIZE 917504
> 07 Sep 19:53:45.155 [ERROR] dpmpool22.farm.particle.cz BAD SIZE 1000000000
> 07 Sep 20:11:14.497 [ERROR] dpmpool22.farm.particle.cz BAD SIZE 917504
> 07 Sep 20:12:51.081 [ERROR] dpmpool20.farm.particle.cz BAD SIZE 917504
> 07 Sep 20:14:58.090 [ERROR] dpmpool20.farm.particle.cz BAD SIZE 100663296
> 07 Sep 20:24:57.457 [ERROR] dpmpool1.farm.particle.cz BAD SIZE 917504
> 07 Sep 20:27:37.596 [ERROR] dpmpool19.farm.particle.cz BAD SIZE 917504
> 07 Sep 20:28:21.617 [ERROR] dpmpool16.farm.particle.cz BAD SIZE 0
>
> Petr
>
> On 9/5/19 2:44 PM, Petr Vokac wrote:
>> I'm going to test in a loop uploads with this code and also enabled gfal
>> debug logging from our remote machines that talks with storage through
>> quite saturated network (we never saw same issue from cluster worker
>> nodes that shares room with diskservers). I also saw few transfers that
>> did not create big sparse file, but just failed while comparing
>> checksums after upload and this is also quite suspicious behavior.
>>
>> Petr
>>
>> On 9/5/19 12:57 PM, andrea manzi wrote:
>>>> Ciao Andrea,
>>>>
>>>> we practically do this:
>>>>
>>>>     ctx = creat_context(...
>>>>     ctx.set_opt_string("XROOTD PLUGIN", "XRD.WANTPROT", "gsi,unix")
>>>>     ctx.mkdir_rec(...
>>>>     ctx.filecopy(..
>>>>
>>>> Is that the gfal transfer API?
>>> yup, internally in gfal2 we then use the XrdCl to perform the copy (
>>> both for local files and TPC)
>>>
>>> today is holiday but we can start investigating with Michal tomorrow
>>> or Monday the possible issue
>>>
>>> cheers
>>>
>>> Andrea
>>>
>>>> Cheers,
>>>> Mario
>>>>
>>>>
>>>>
>>>> On 05/09/2019 10:35, andrea manzi wrote:
>>>>> ciao Mario,
>>>>>
>>>>> i guess you are using the gfal2 transfer API in Rucio right ? ( the
>>>>> equivalent to gfal-copy)
>>>>>
>>>>> in gfal2,  in the case of the transfer API  for xrootd, we don't do
>>>>> much
>>>>> as we just use the xrootd client (  XrdCl::CopyProcess), so we don't
>>>>> touch file offsets.
>>>>>
>>>>> so we may want to investigate if there is a problem in XrdCl. Do you
>>>>> have the logs of the client?
>>>>>
>>>>> If you are using the gfal2 Posix API instead we need to investigate a
>>>>> possible bug there
>>>>>
>>>>> cheers
>>>>>
>>>>> Andrea
>>>>>
>>>>>
>>>>>> Hi Petr, Andy,
>>>>>>
>>>>>> once extra piece of information here: Rucio uses the python
>>>>>> bindings to
>>>>>> the gfal library to upload these files, and de-facto always uses the
>>>>>> current stable gfal version that is published on ALRB. Rucio doesn't
>>>>>> implement any other upload by itself.
>>>>>>
>>>>>> Cheers,
>>>>>> Mario
>>>>>>
>>>>>> On 05/09/2019 04:21, Andrew Hanushevsky wrote:
>>>>>>> Hi Petr,
>>>>>>>   So, look at this line:
>>>>>>>   190904 17:02:16 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 1784104010@6506989258351653185
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>>
>>>>>>> Why did the application doing the write specify an offset of
>>>>>>> 5506989258351653185? So, that is the problem and, as you noted, it
>>>>>>> doesn’t matter whether POSC is on or not. Indeed, it’s the first
>>>>>>> write
>>>>>>> to the file and only the first write that suffers from this
>>>>>>> problem. So,
>>>>>>> there are two possibilities here: a) there is a bug in the
>>>>>>> uploader, or
>>>>>>> b) there was a network data corruption that managed to hit the offset
>>>>>>> (somewhat unlikely).
>>>>>>>   Andy
>>>>>>>   *From:* Petr Vokac <mailto:[log in to unmask]>
>>>>>>> *Sent:* Wednesday, September 04, 2019 12:19 PM
>>>>>>> *To:* Andrew Hanushevsky <mailto:[log in to unmask]>
>>>>>>> *Cc:* dpm-devel <mailto:[log in to unmask]> ;
>>>>>>> [log in to unmask]
>>>>>>> <mailto:[log in to unmask]>
>>>>>>> *Subject:* Re: Uploaded file 128TB on filesystem
>>>>>>>   I configured our diskservers with "ofs.trace write" (I did not
>>>>>>> yet add
>>>>>>> in xrootd configuraton "ofs.persist hold 0") and today new file
>>>>>>> appeared
>>>>>>> on one of our storage servers:
>>>>>>>
>>>>>>> *[root@dpmpool1 ~]# grep atlasprd.2817:152
>>>>>>> /var/log/xrootd/dpmdisk/xrootd.log*
>>>>>>> 190904 17:02:08 1894 XrootdXeq:
>>>>>>> atlasprd.2817:152@[2001:718:401:6e03::1:82] pub IP64 login as
>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>> 190904 17:02:16 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 1784104010@6506989258351653185
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:16 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@262144
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@524288
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@786432
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@1048576
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@1310720
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@1572864
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 262144@1835008
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:02:17 22915 atlasprd.2817:152@[2001:718:401:6e03::1:82]
>>>>>>> ofs_write: 135000@2097152
>>>>>>> fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_5TeV/90/9f/EVNT.18960107._035789.pool.root.1.rucio.upload
>>>>>>> 190904 17:07:32 2163 XrootdXeq:
>>>>>>> atlasprd.2817:152@[2001:718:401:6e03::1:82] disc 0:05:24
>>>>>>>
>>>>>>> *[root@dpmpool1 ~]# ls -la
>>>>>>> /mnt/fs2/atlas/2019-09-04/EVNT.18960107._035789.pool.root.1.rucio.upload.309650.1567609328*
>>>>>>>
>>>>>>> -rw-rw---- 1 dpmmgr dpmmgr 6506989258351919104 Sep  4 17:02
>>>>>>> /mnt/fs2/atlas/2019-09-04/EVNT.18960107._035789.pool.root.1.rucio.upload.309650.1567609328
>>>>>>>
>>>>>>> *[root@dpmpool1 ~]# du --block-size=1
>>>>>>> /mnt/fs2/atlas/2019-09-04/EVNT.18960107._035789.pool.root.1.rucio.upload.309650.1567609328*
>>>>>>>
>>>>>>> 2236416   
>>>>>>> /mnt/fs2/atlas/2019-09-04/EVNT.18960107._035789.pool.root.1.rucio.upload.309650.1567609328
>>>>>>>
>>>>>>> This file has first 256kb filled with 0x00 (adler32 up to original
>>>>>>> size
>>>>>>> 2232152 doesn't match) and this time there is no ofs_Posc in the
>>>>>>> xrootd.log file.
>>>>>>>
>>>>>>> Petr Vokac
>>>>>>>
>>>>>>> On 8/19/19 3:24 AM, Andrew Hanushevsky wrote:
>>>>>>>> Hi Petr,
>>>>>>>>
>>>>>>>> The issue here is that the python script (I believe it's python) is
>>>>>>>> disconnecting without closing the file and then reconnecting and
>>>>>>>> writing at a high offset. The log is pretty clear on that. You can
>>>>>>>> verify by adding "ofs.trace write" to see the write offsets
>>>>>>>> (hough it
>>>>>>>> will produce a lot of output). You can also prohibit reconnection by
>>>>>>>> adding this to the persist directive.
>>>>>>>>
>>>>>>>> ofs.persist hold 0
>>>>>>>>
>>>>>>>> which would delete the file the moment the client disconnects
>>>>>>>> without
>>>>>>>> closing the file.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> On Sat, 17 Aug 2019, Petr Vokac wrote:
>>>>>>>>
>>>>>>>>> Today new "65PB" file appeared on our storage and it was again
>>>>>>>>> transferred with xrootd protocol. I can see again same error in
>>>>>>>>> the log
>>>>>>>>> file:
>>>>>>>>>
>>>>>>>>> **[root@dpmpool21 ~]# ls -la
>>>>>>>>> /mnt/fs4/atlas/2019-08-17/EVNT.18893579._003197.pool.root.1.rucio.upload.234855.1566031522*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *-rw-rw---- 1 dpmmgr dpmmgr 72340172838077281 Aug 17 10:48
>>>>>>>>> /mnt/fs4/atlas/2019-08-17/EVNT.18893579._003197.pool.root.1.rucio.upload.234855.1566031522
>>>>>>>>>
>>>>>>>>> ***[root@dpmpool21 ~]# grep atlasprd.2757
>>>>>>>>> /var/log/xrootd/dpmdisk/xrootd.log*
>>>>>>>>> ...
>>>>>>>>> 190817 10:47:55 57803 XrootdXeq:
>>>>>>>>> atlasprd.2757:65@[2001:718:401:6e03::1:43] pub IP64 login as
>>>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>>>> 190817 10:47:55 61859 XrootdXeq:
>>>>>>>>> atlasprd.2757:66@[2001:718:401:6e03::1:83] pub IP64 login as
>>>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>>>> 190817 10:48:52 62233 XrootdXeq:
>>>>>>>>> atlasprd.2757:46@[2001:718:401:6e03::1:72] disc 0:00:58
>>>>>>>>> 190817 10:48:52 62954 XrootdXeq:
>>>>>>>>> atlasprd.2757:25@[2001:718:401:6e03::1:72] disc 0:03:30
>>>>>>>>> 190817 10:48:52 57804 XrootdXeq:
>>>>>>>>> atlasprd.2757:28@[2001:718:401:6e03::1:72] pub IP64 login as
>>>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>>>> 190817 10:48:52 57804 ofs_Posc: Creator changed from
>>>>>>>>> atlasprd.2757:25@[2001:718:401:6e03::1:72] to
>>>>>>>>> atlasprd.2757:28@[2001:718:401:6e03::1:72] for
>>>>>>>>> /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_13TeV/8b/f3/EVNT.18893579._003197.pool.root.1.rucio.upload
>>>>>>>>>
>>>>>>>>> 190817 10:48:53 57803 XrootdXeq:
>>>>>>>>> atlasprd.2757:46@[2001:718:401:6017:20:0:14:44] pub IP64 login as
>>>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>>>> 190817 10:49:01 57818 XrootdXeq:
>>>>>>>>> atlasprd.2757:70@[2001:718:401:6017:20:0:11:11] pub IP64 login as
>>>>>>>>> /DC=ch/DC=cern/OU=Organic
>>>>>>>>> Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1
>>>>>>>>> 190817 10:49:07 62954 XrootdXeq:
>>>>>>>>> atlasprd.2757:55@[2001:718:401:6e03::1:81] disc 0:10:17
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> Any suggestion how to trace what is causing this behavior?
>>>>>>>>>
>>>>>>>>> Petr
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7/20/19 7:48 AM, Petr Vokac wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I ran the command you sent me
>>>>>>>>>>
>>>>>>>>>> grep "atlasprd.2757:" <logfile> | egrep
>>>>>>>>>> '2001:718:401:6e03::1:21|2001:718:401:6e03::1:32' > <excerpt>
>>>>>>>>>>
>>>>>>>>>> and because the output was not just few lines long I uploaded
>>>>>>>>>> it to the
>>>>>>>>>>
>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/dpmpool22.xrootd.log
>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/dpmpool24.xrootd.log
>>>>>>>>>>
>>>>>>>>>> These machines on 2001:718:401:6e03::/64 subnet are in remote
>>>>>>>>>> location
>>>>>>>>>> (but relatively close with RTT ~ 0.7ms). I can see several
>>>>>>>>>> ofs_Posc
>>>>>>>>>> occurrences in the logs of our disknodes during "190717
>>>>>>>>>> 17:xx:xx" time
>>>>>>>>>> period:
>>>>>>>>>>
>>>>>>>>>> http://vokac.web.cern.ch/vokac/tmp/POSC.txt
>>>>>>>>>>
>>>>>>>>>> Petr
>>>>>>>>>>
>>>>>>>>>> On 7/19/19 8:51 PM, Andrew Hanushevsky wrote:
>>>>>>>>>>> Hi Petr,
>>>>>>>>>>>
>>>>>>>>>>> Well, POSC says that the client reconnected for writing so not
>>>>>>>>>>> all
>>>>>>>>>>> went well. Can I get the log output for that occurrence as I
>>>>>>>>>>> previously described?
>>>>>>>>>>>
>>>>>>>>>>> Andy
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jul 2019, Petr Vokac wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We are using Rucio download client to transfer job outputs
>>>>>>>>>>>> (it is
>>>>>>>>>>>> build
>>>>>>>>>>>> on top of gfal2 python interface).
>>>>>>>>>>>>
>>>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/dpmpool22.xrootd.log
>>>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/dpmpool24.xrootd.log
>>>>>>>>>>>>
>>>>>>>>>>>> this is the full job output unfortunately with minimal gfal2
>>>>>>>>>>>> logging
>>>>>>>>>>>> output.
>>>>>>>>>>>>
>>>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/grid.2258136.4.out-from-2001:718:401:6e03::1:21-to-dpmpool22.txt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://vokac.web.cern.ch/vokac/tmp/grid.2255921.7.out-from-2001:718:401:6e03::1:32-to-dpmpool24.txt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like according gfal2 transfer was OK
>>>>>>>>>>>>
>>>>>>>>>>>> 2019-07-17 15:09:17,782 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_xrootd.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,782 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_rfio.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,783 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,783 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_http.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,783 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_lfc.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,783 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_file.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,784 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_sftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,784 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_dcap.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,784 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_srm.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,793 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_xrootd.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,793 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_rfio.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,793 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_http.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_lfc.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_file.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_sftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_dcap.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,794 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_srm.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,795 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_xrootd.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,795 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_rfio.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,795 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,795 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_http.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,796 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_lfc.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,796 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_file.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,796 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_sftp.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,796 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_dcap.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,796 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | connect                   |
>>>>>>>>>>>> [gfal_module_load] plugin
>>>>>>>>>>>> /usr/lib64/gfal2-plugins//libgfal_plugin_srm.so loaded with
>>>>>>>>>>>> success
>>>>>>>>>>>> 2019-07-17 15:09:17,810 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | __gfal2_copy             
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH GFAL2:CORE:COPY LIST:ENTER
>>>>>>>>>>>> 2019-07-17 15:09:17,810 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | __gfal2_copy             
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH GFAL2:CORE:COPY LIST:ITEM
>>>>>>>>>>>> file:///scratch/condor/dir_24031/atlas_HudmF2he/PanDA_Pilot-4419868006/log.18636890._001059.job.log.tgz.1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> =>
>>>>>>>>>>>> root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_13TeV/26/98/log.18636890._001059.job.log.tgz.1.rucio.upload
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2019-07-17 15:09:17,810 | INFO     | copytool_out        |
>>>>>>>>>>>> gfal2                            | __gfal2_copy             
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH GFAL2:CORE:COPY LIST:EXIT
>>>>>>>>>>>> 2019-07-17 15:09:17,811 | INFO     | Dummy-2             |
>>>>>>>>>>>> gfal2                            | (unknown function)       
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH xroot TRANSFER:ENTER
>>>>>>>>>>>> file://localhost///scratch/condor/dir_24031/atlas_HudmF2he/PanDA_Pilot-4419868006/log.18636890._001059.job.log.tgz.1?xrd.gsiusrpxy=/scratch/condor/dir_24031/job.9XONDm6yx7unnoBGSqYX2MjnABFKDmABFKDmSyhTDmEBFKDmqAGN3n.proxy&xrd.wantprot=gsi,unix
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> =>
>>>>>>>>>>>> root://golias100.farm.particle.cz:1094///dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_13TeV/26/98/log.18636890._001059.job.log.tgz.1.rucio.upload?xrd.gsiusrpxy=/scratch/condor/dir_24031/job.9XONDm6yx7unnoBGSqYX2MjnABFKDmABFKDmSyhTDmEBFKDmqAGN3n.proxy&x
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2019-07-17 15:09:17,811 | INFO     | Dummy-2             |
>>>>>>>>>>>> gfal2                            | (unknown function)       
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH xroot TRANSFER:TYPE streamed
>>>>>>>>>>>> 2019-07-17 15:11:23,914 | INFO     | Dummy-2             |
>>>>>>>>>>>> gfal2                            | (unknown function)       
>>>>>>>>>>>> | Event
>>>>>>>>>>>> triggered: BOTH xroot TRANSFER:EXIT Job finished, [SUCCESS]
>>>>>>>>>>>>
>>>>>>>>>>>> but probably rucio tries to verify checksum (gfal-sum via python
>>>>>>>>>>>> bindings) and it fails
>>>>>>>>>>>>
>>>>>>>>>>>> 2019-07-17 15:15:23,774 | INFO     | copytool_out        |
>>>>>>>>>>>> pilot.copytool.rucio             | copy_out                  |
>>>>>>>>>>>> stderr = An unknown exception occurred.
>>>>>>>>>>>> Details: Checksum not validated
>>>>>>>>>>>>
>>>>>>>>>>>> because DPM is unable to calculate checksum for 128TB file
>>>>>>>>>>>> within
>>>>>>>>>>>> timeout.
>>>>>>>>>>>>
>>>>>>>>>>>> Petr
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/19/19 5:07 AM, Andrew Hanushevsky wrote:
>>>>>>>>>>>>> Hi Petr,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Indeed, it looks like what was used to upload the files
>>>>>>>>>>>>> (xrdcp?)
>>>>>>>>>>>>> created a very large sparse file by writing with a very large
>>>>>>>>>>>>> offset.
>>>>>>>>>>>>> The clue here is that an anomaly occurred during the write
>>>>>>>>>>>>> and these
>>>>>>>>>>>>> two messages (lkely each corresponding to each file) tell us
>>>>>>>>>>>>> something....
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 190717 17:10:39 332092 ofs_Posc: Creator changed from
>>>>>>>>>>>>>    atlasprd.2757:90@[2001:718:401:6e03::1:21] to
>>>>>>>>>>>>>    atlasprd.2757:39@[2001:718:401:6e03::1:21] for
>>>>>>>>>>>>>  
>>>>>>>>>>>>> /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_13TeV/26/98/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    log.18636890._001059.job.log.tgz.1.rucio.upload
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 190717 17:29:26 14204 ofs_Posc: Creator changed from
>>>>>>>>>>>>>    atlasprd.2757:33@[2001:718:401:6e03::1:32] to
>>>>>>>>>>>>>    atlasprd.2757:95@[2001:718:401:6e03::1:32] for
>>>>>>>>>>>>>  
>>>>>>>>>>>>> /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_5TeV/09/bc/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    AOD.18595963._000786.pool.root.1.rucio.upload
>>>>>>>>>>>>>
>>>>>>>>>>>>> What the POSC system is saying is that in each case the
>>>>>>>>>>>>> original
>>>>>>>>>>>>> creator of the file lost contact with the server. It then
>>>>>>>>>>>>> reconnected
>>>>>>>>>>>>> and resumed writing with a successful close. POSC allows
>>>>>>>>>>>>> this to
>>>>>>>>>>>>> avoid
>>>>>>>>>>>>> network glitches from deleting a file because a network
>>>>>>>>>>>>> glitch will
>>>>>>>>>>>>> cause the client to reconnect and resume where it left off.
>>>>>>>>>>>>> This is
>>>>>>>>>>>>> only allowed when the resuming client comes from the same IP
>>>>>>>>>>>>> address,
>>>>>>>>>>>>> has the same process ID, and the same logical name. In this
>>>>>>>>>>>>> case it
>>>>>>>>>>>>> was "atlasprd.2757". I find that rather odd because it's
>>>>>>>>>>>>> next to
>>>>>>>>>>>>> impossible for two processes on two different machines to
>>>>>>>>>>>>> have the
>>>>>>>>>>>>> same process number (i.e. 2757). So, this is why I am
>>>>>>>>>>>>> wondering if
>>>>>>>>>>>>> this was really an xrdcp or some other program.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should look to see what these client were actually doing
>>>>>>>>>>>>> during
>>>>>>>>>>>>> this period as they clearly did not do the right thing when
>>>>>>>>>>>>> they
>>>>>>>>>>>>> reconnected. So, could you grep out all interactions of
>>>>>>>>>>>>> these two
>>>>>>>>>>>>> client from the log. Something like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> grep "atlasprd.2757:" <logfile> | egrep
>>>>>>>>>>>>> '2001:718:401:6e03::1:21|2001:718:401:6e03::1:32' > <excerpt>
>>>>>>>>>>>>>
>>>>>>>>>>>>> and send the excerpt to me?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 18 Jul 2019, Petr Vokac wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> our DPM which is using native xrootd 4.9.1 yesterday
>>>>>>>>>>>>>> created two
>>>>>>>>>>>>>> different 128TB files on two different diskservers that were
>>>>>>>>>>>>>> received as
>>>>>>>>>>>>>> a job output from two different worker nodes. The size of
>>>>>>>>>>>>>> original
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> was few megabytes and when I check beginning of stored
>>>>>>>>>>>>>> files up to
>>>>>>>>>>>>>> original size & checksum everything seems to be OK - except
>>>>>>>>>>>>>> these are
>>>>>>>>>>>>>> sparse files with size 128TB. Original transferred files
>>>>>>>>>>>>>> from WN
>>>>>>>>>>>>>> looked
>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> mc15_13TeV:log.18636890._001059.job.log.tgz.1 - original size
>>>>>>>>>>>>>> 3101899, adler32 742e1336
>>>>>>>>>>>>>> mc16_5TeV:AOD.18595963._000786.pool.root.1 - original size
>>>>>>>>>>>>>> 737474514,
>>>>>>>>>>>>>> adler32 dc68cb21
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> but on our diskservers they were stored as
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@dpmpool22 ~]# ls -l
>>>>>>>>>>>>>> /mnt/fs1/atlas/2019-07-17/log.18636890._001059.job.log.tgz.1.rucio.upload.209928.1563376157
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -rw-rw---- 1 dpmmgr dpmmgr 140560257079236 Jul 17 17:11
>>>>>>>>>>>>>> /mnt/fs1/atlas/2019-07-17/log.18636890._001059.job.log.tgz.1.rucio.upload.209928.1563376157
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@dpmpool22 ~]# du --block-size=1
>>>>>>>>>>>>>> /mnt/fs1/atlas/2019-07-17/log.18636890._001059.job.log.tgz.1.rucio.upload.209928.1563376157
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3108864
>>>>>>>>>>>>>> /mnt/fs1/atlas/2019-07-17/log.18636890._001059.job.log.tgz.1.rucio.upload.209928.1563376157
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@dpmpool24 ~]# ls -l
>>>>>>>>>>>>>> /mnt/fs2/atlas/2019-07-17/AOD.18595963._000786.pool.root.1.rucio.upload.210480.1563377083
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -rw-rw---- 1 dpmmgr dpmmgr 139734345056408 Jul 17 17:31
>>>>>>>>>>>>>> /mnt/fs2/atlas/2019-07-17/AOD.18595963._000786.pool.root.1.rucio.upload.210480.1563377083
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@dpmpool24 tmp]# du --block-size=1
>>>>>>>>>>>>>> /mnt/fs2/atlas/2019-07-17/AOD.18595963._000786.pool.root.1.rucio.upload.210480.1563377083
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 737480704
>>>>>>>>>>>>>> /mnt/fs2/atlas/2019-07-17/AOD.18595963._000786.pool.root.1.rucio.upload.210480.1563377083
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was told that DPM is not really involved in transfering /
>>>>>>>>>>>>>> storing
>>>>>>>>>>>>>> files with xrootd protocol, so I guess something bad must
>>>>>>>>>>>>>> happened
>>>>>>>>>>>>>> inside XRootD libraries. Do you have any idea why these two
>>>>>>>>>>>>>> transfers
>>>>>>>>>>>>>> generated such extreme files? Unfortunately our xrootd
>>>>>>>>>>>>>> logging is
>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>> low and I was able to find only info about file creation
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@dpmpool22 ~]# grep log.18636890._001059.job.log.tgz
>>>>>>>>>>>>>> /var/log/xrootd/dpmdisk/xrootd.log
>>>>>>>>>>>>>> 190717 17:10:39 332092 ofs_Posc: Creator changed from
>>>>>>>>>>>>>    atlasprd.2757:90@[2001:718:401:6e03::1:21] to
>>>>>>>>>>>>>    atlasprd.2757:39@[2001:718:401:6e03::1:21] for
>>>>>>>>>>>>>  
>>>>>>>>>>>>> /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc15_13TeV/26/98/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    log.18636890._001059.job.log.tgz.1.rucio.upload
>>>>>>>>>>>>>> [root@dpmpool24 tmp]# grep AOD.18595963._000786.pool.root
>>>>>>>>>>>>>> /var/log/xrootd/dpmdisk/xrootd.log
>>>>>>>>>>>>>> 190717 17:29:26 14204 ofs_Posc: Creator changed from
>>>>>>>>>>>>>    atlasprd.2757:33@[2001:718:401:6e03::1:32] to
>>>>>>>>>>>>>    atlasprd.2757:95@[2001:718:401:6e03::1:32] for
>>>>>>>>>>>>>  
>>>>>>>>>>>>> /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/mc16_5TeV/09/bc/AOD.18595963.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    _000786.pool.root.1.rucio.upload
>>>>>>>>>>>>>> Petr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ########################################################################
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following
>>>>>>>>>>>>>> link:
>>>>>>>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> ########################################################################
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following
>>>>>>>>>>>>> link:
>>>>>>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>>>>>>>>>
>>>>>>>>>>> ########################################################################
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>
>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
>
> ------------------------------------------------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1