Print

Print


Hi Nikolai,

Thank you, this helped. Andy and I have figured it out ... it is a bug 
(uninitialized variable) in xcache that seems to get activated when 
authorization is configured.

We're working on a fix / workaround for 5.0.0.

Thank you for your patience and help ... and sorry for the trouble.

Cheers,
Matevz

On 2020-07-09 08:28, Nikolai Hartmann wrote:
> Hi,
> 
> Unfortunately it seems to occur also for release 5 (5.0.0-1). I tried
> with this container:
> 
> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/v5/xcache.singularity.def__;!!Mih3wA!R_VQAVZ4HLYGZ4lLNwZGyqUy1mEw60Sbs0rQ7_uScOv-qSpeAMn8vxlSvXKSGPtPAW8n$
> 
> using basically the same configuration as before, except that i
> commented the `namelib` directive since the rucioN2N plugin rpm i was
> using required xrootd < 5.0.0-1.
> 
> In case it helps, here the new image:
> 
> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/4AcEcFRPwXqZb9b__;!!Mih3wA!R_VQAVZ4HLYGZ4lLNwZGyqUy1mEw60Sbs0rQ7_uScOv-qSpeAMn8vxlSvXKSGHGlL74g$
> 
> Thanks,
> Nikolai
> 
> On 7/9/20 1:31 AM, Matevz Tadel wrote:
>> On 2020-07-08 15:00, Andrew Hanushevsky wrote:
>>> Could you try git head? There were issues in RC4. There might still be
>>> issues, but I never saw the one you and Nikolai tripped over.
>>
>> Yes, I managed to reproduce it on rc4 tag with just basic gsi enabled
>> ... on mater / 5.0.0 it doesn't happen.
>>
>> Nikolai, please update to 5.0.0.
>>
>> Matevz
>>
>>> Andy
>>>
>>>
>>> On Wed, 8 Jul 2020, Matevz Tadel wrote:
>>>
>>>> On 2020-07-08 14:19, Andrew Hanushevsky wrote:
>>>>> What release? Git head?
>>>>
>>>> This was from Nikolai's image, 5-rc4.
>>>>
>>>> \m
>>>>
>>>>> On Wed, 8 Jul 2020, Matevz Tadel wrote:
>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> On 2020-07-08 13:49, Andrew Hanushevsky wrote:
>>>>>>> Hi Matevz,
>>>>>>>
>>>>>>> Well, what kind of authentication? Clearly, the kind we use
>>>>>>> doesn't cause this
>>>>>>> problem. It could be just a random core smash but if it's random
>>>>>>> we should be
>>>>>>> various effects not just a crash in this particular code path, right?
>>>>>>
>>>>>> xcache without any security config, everything works smooth.
>>>>>>
>>>>>> xcache with sec.protocol /usr/lib64 gsi --- trouble:
>>>>>>
>>>>>> 200708 13:38:44 240995 XrootdXeq: matevz.241046:31@uaf-7 pub IPv4
>>>>>> login as d0ba0e6c.0
>>>>>> 200708 13:38:44 240995 Posix_P2L: file
>>>>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>> pfn2lfn
>>>>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>>
>>>>>> [2020-07-08 13:38:44.739012 -0700][Error  ][AsyncSock         ]
>>>>>> [[log in to unmask]:1094.0] Unable to connect: network is
>>>>>> unreachable
>>>>>> [2020-07-08 13:38:44.739092 -0700][Error  ][PostMaster        ]
>>>>>> [[log in to unmask]:1094] elapsed = 0, pConnectionWindow = 120
>>>>>> seconds.
>>>>>> [2020-07-08 13:38:45.637583 -0700][Error  ][XRootDTransport   ]
>>>>>> [[log in to unmask]:1094.0] Authentication with gsi failed:
>>>>>> [2020-07-08 13:38:45.974332 -0700][Error  ][AsyncSock         ]
>>>>>> [[log in to unmask]:1095.0] Unable to connect: network is
>>>>>> unreachable
>>>>>> [2020-07-08 13:38:45.974400 -0700][Error  ][PostMaster        ]
>>>>>> [[log in to unmask]:1095] elapsed = 0, pConnectionWindow =
>>>>>> 120 seconds.
>>>>>> 200708 13:38:46 240995 XrdPfc_Manager: info Cache::Attach()
>>>>>> root:[log in to unmask]
>>>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::GetFile
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root,
>>>>>> io 0xe30f50
>>>>>> 200708 13:38:46 240995 XrdPfc_IO: debug
>>>>>> IOEntireFile::initCachedStat get stat from client res = 0, size =
>>>>>> 2272072
>>>>>> root:[log in to unmask]
>>>>>> 200708 13:38:46 240995 XrdPfc_File: debug Creating new file info,
>>>>>> data size = 2272072 num blocks = 3
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::inc_ref_cnt
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root,
>>>>>> cnt at exit = 1
>>>>>> 200708 13:38:46 240995 XrdPfc_File: debug File::AddIO() io =
>>>>>> 0xe30f50
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::Attach()
>>>>>> root:[log in to unmask]
>>>>>> location: [log in to unmask]:1095
>>>>>> [2020-07-08 13:38:47.022428 -0700][Error  ][AsyncSock         ]
>>>>>> [[log in to unmask]:1095.0] Socket error encountered:
>>>>>> [ERROR] Invalid arguments
>>>>>> [2020-07-08 13:38:47.022506 -0700][Error  ][XRootD            ]
>>>>>> [[log in to unmask]:1095] Unable to get the response to
>>>>>> request kXR_read (handle: 0x00000000, offset: 0, size: 1048576)
>>>>>> [2020-07-08 13:38:47.022625 -0700][Error  ][File              ]
>>>>>> [0xf0b040@root:[log in to unmask]:1094//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root?xrdcl.requuid=6730e10b-8b40-43bf-9d0a-75da982939e8]
>>>>>> Fatal file state error. Message kXR_read (handle: 0x00000000,
>>>>>> offset: 0, size: 1048576) returned with [ERROR] Invalid arguments
>>>>>> 200708 13:38:47 241052 XrdPfc_File: error
>>>>>> File::ProcessBlockResponse block 0xff3440  0 error=-22
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>> 200708 13:38:47 240995 XrdPfc_File: error File::Read() io 0xe30f50,
>>>>>> block 0 finished with error 22 invalid argument
>>>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>> src/tcmalloc.cc:284] Attempt to free invalid pointer
>>>>>> 0x313262003543620a
>>>>>>
>>>>>> Note that while we get all these network errors at the start, cache
>>>>>> still got the stat info from the server (knows the size of the file).
>>>>>>
>>>>>> I must admit I never test xcache with auth on :( I'll try it out
>>>>>> now, well, after lunch :)
>>>>>>
>>>>>> Matevz
>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 8 Jul 2020, Matevz Tadel wrote:
>>>>>>>
>>>>>>>> Yay, that was a journey ... but I can reproduce it now!
>>>>>>>>
>>>>>>>> It is super strange this happens with xcache with authentication
>>>>>>>> on only ...
>>>>>>>> this really should have no effect. I first tried without it and
>>>>>>>> it worked and
>>>>>>>> then something rang a bell that you said so in the email :).
>>>>>>>>
>>>>>>>> Andy, does this ring any bells for you? It looks like interaction
>>>>>>>> between
>>>>>>>> server / client usage of X509 stuffe.
>>>>>>>>
>>>>>>>> Anyway, I'm digging on on the xcache side ...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Matevz
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020-07-08 07:41, Nikolai Hartmann wrote:
>>>>>>>>> Hi Matevz,
>>>>>>>>>
>>>>>>>>> I might have something like a "minimal failing example".
>>>>>>>>> Unfortunately
>>>>>>>>> the problem only appears when authentication is required, so the
>>>>>>>>> example
>>>>>>>>> will only work on a machine that has a valid host certificate
>>>>>>>>> and the
>>>>>>>>> corresponding directory has to be bind-mounted into the container.
>>>>>>>>>
>>>>>>>>> I uploaded my container image here:
>>>>>>>>>
>>>>>>>>> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and made a directory structure (tar archive attached) to bind
>>>>>>>>> mount into
>>>>>>>>> the container (and containing the minimal failing xcache config
>>>>>>>>> and a
>>>>>>>>> script for starting gdb inside the container)
>>>>>>>>>
>>>>>>>>> To reproduce, extract the archive, enter the directory and run (as
>>>>>>>>> non-root user)
>>>>>>>>>
>>>>>>>>> singularity run -B $(pwd)/data:/data -B
>>>>>>>>> $(pwd)/config:/etc/xrootd:ro -B
>>>>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>>>>>>>>>
>>>>>>>>> where <hostkey-dir> is a directory that contains
>>>>>>>>>
>>>>>>>>> hostkey.pem
>>>>>>>>> hostcert.pem
>>>>>>>>> vomsdir (will become X509_VOMS_DIR)
>>>>>>>>> certificates (will become X509_CERT_DIR)
>>>>>>>>>
>>>>>>>>> and <singularity-image> is the path to the singularity image.
>>>>>>>>>
>>>>>>>>> That should run xrootd and the log should appear in
>>>>>>>>> data/xrd/var/log/xrootd.log
>>>>>>>>>
>>>>>>>>> I used this example to produce the failure:
>>>>>>>>>
>>>>>>>>> xrdcp -f
>>>>>>>>> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>>>>>
>>>>>>>>> /dev/null
>>>>>>>>>
>>>>>>>>> The simplest way to run gdb seemed to directly start xrootd with
>>>>>>>>> gdb.
>>>>>>>>> This can be done with the script run_xcache_debug.sh in the
>>>>>>>>> attached
>>>>>>>>> archive. Instead of the command above just use
>>>>>>>>>
>>>>>>>>> singularity exec -B $(pwd)/data:/data -B
>>>>>>>>> $(pwd)/config:/etc/xrootd:ro -B
>>>>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>>>>>>>>> ./run_xcache_debug.sh
>>>>>>>>>
>>>>>>>>> Note: Before restarting, best delete the content of the data
>>>>>>>>> directory
>>>>>>>>> since the bug also did not seem to occur when the file was already
>>>>>>>>> cached (e.g after testing without authentication)
>>>>>>>>>
>>>>>>>>> Sorry for the overly complicated reproducing steps, but since it
>>>>>>>>> only
>>>>>>>>> happened when i authentication was enabled i didn't know how to
>>>>>>>>> do it
>>>>>>>>> simpler. I hope it helps.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Nikolai
>>>>>>>>>
>>>>>>>>> On 7/7/20 8:42 PM, Matevz Tadel wrote:
>>>>>>>>>> Thanks Nikolai, I shall continue my investigation :)
>>>>>>>>>>
>>>>>>>>>> Matevz
>>>>>>>>>>
>>>>>>>>>> On 2020-07-06 23:59, Nikolai Hartmann wrote:
>>>>>>>>>>> Hi Matevz,
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot for looking into this.
>>>>>>>>>>>
>>>>>>>>>>> - The crash seems to happen always when i make a request
>>>>>>>>>>> - Currently prefetching is disabled
>>>>>>>>>>> - Yes, i think it is direct proxy mode
>>>>>>>>>>> - stack trace is attached
>>>>>>>>>>>
>>>>>>>>>>> A similar setup seems to work for Ilija without issues with
>>>>>>>>>>> the xcaches
>>>>>>>>>>> using slate - i tried to mimic that setup closely. Running
>>>>>>>>>>> xrootd from
>>>>>>>>>>> this container image:
>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> using this config
>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Nikolai
>>>>>>>>>>>
>>>>>>>>>>> On 7/7/20 1:38 AM, Matevz Tadel wrote:
>>>>>>>>>>>> Hi Nikolai,
>>>>>>>>>>>>
>>>>>>>>>>>> I tried to reproduce it with current master in nearly all ways,
>>>>>>>>>>>> with/without prefetching and with direct/forwarding mode.
>>>>>>>>>>>> Also, with std
>>>>>>>>>>>> malloc and tcmalloc. No luck :(
>>>>>>>>>>>>
>>>>>>>>>>>> Backtrace or core would help a lot at this point.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Matevz
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to
>>>>>>>>>>>>> mitigate a
>>>>>>>>>>>>> problem with dCache.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now when i try to read a file through xcache it crashes with
>>>>>>>>>>>>> "Attempt to
>>>>>>>>>>>>> free invalid pointer". I attached the corresponding part of
>>>>>>>>>>>>> the log.
>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Nikolai
>>>>>>>>>>>>>
>>>>>>>>>>>>> ########################################################################
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following
>>>>>>>>>>>>> link:
>>>>>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> ########################################################################
>>>>>>>>
>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>
>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!XFGFw19U2eq-rA0gnMt46KV3Nmc-QzeRzIK6fXXO8cvBFHyGZUlCmc9OXbuf2OOR3Nx7$
>>>>>>>>
>>>>>>
>>>>>> ########################################################################
>>>>>>
>>>>>> Use REPLY-ALL to reply to list
>>>>>>
>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$
>>>>>
>>>>>
>>>>>
>>>>> ########################################################################
>>>>>
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$
>>>>
>>>>
>>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!R_VQAVZ4HLYGZ4lLNwZGyqUy1mEw60Sbs0rQ7_uScOv-qSpeAMn8vxlSvXKSGHXeHleb$

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1