Could you try git head? There were issues in RC4. There might still be
issues, but I never saw the one you and Nikolai tripped over.
Andy
On Wed, 8 Jul 2020, Matevz Tadel wrote:
> On 2020-07-08 14:19, Andrew Hanushevsky wrote:
>> What release? Git head?
>
> This was from Nikolai's image, 5-rc4.
>
> \m
>
>> On Wed, 8 Jul 2020, Matevz Tadel wrote:
>>
>>> Hi Andy,
>>>
>>> On 2020-07-08 13:49, Andrew Hanushevsky wrote:
>>>> Hi Matevz,
>>>>
>>>> Well, what kind of authentication? Clearly, the kind we use doesn't cause
>>>> this
>>>> problem. It could be just a random core smash but if it's random we
>>>> should be
>>>> various effects not just a crash in this particular code path, right?
>>>
>>> xcache without any security config, everything works smooth.
>>>
>>> xcache with sec.protocol /usr/lib64 gsi --- trouble:
>>>
>>> 200708 13:38:44 240995 XrootdXeq: matevz.241046:31@uaf-7 pub IPv4 login as
>>> d0ba0e6c.0
>>> 200708 13:38:44 240995 Posix_P2L: file
>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> pfn2lfn
>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> [2020-07-08 13:38:44.739012 -0700][Error ][AsyncSock ]
>>> [[log in to unmask]:1094.0] Unable to connect: network is unreachable
>>> [2020-07-08 13:38:44.739092 -0700][Error ][PostMaster ]
>>> [[log in to unmask]:1094] elapsed = 0, pConnectionWindow = 120 seconds.
>>> [2020-07-08 13:38:45.637583 -0700][Error ][XRootDTransport ]
>>> [[log in to unmask]:1094.0] Authentication with gsi failed:
>>> [2020-07-08 13:38:45.974332 -0700][Error ][AsyncSock ]
>>> [[log in to unmask]:1095.0] Unable to connect: network is
>>> unreachable
>>> [2020-07-08 13:38:45.974400 -0700][Error ][PostMaster ]
>>> [[log in to unmask]:1095] elapsed = 0, pConnectionWindow = 120
>>> seconds.
>>> 200708 13:38:46 240995 XrdPfc_Manager: info Cache::Attach()
>>> root:[log in to unmask]
>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::GetFile
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root, io
>>> 0xe30f50
>>> 200708 13:38:46 240995 XrdPfc_IO: debug IOEntireFile::initCachedStat get
>>> stat from client res = 0, size = 2272072
>>> root:[log in to unmask]
>>> 200708 13:38:46 240995 XrdPfc_File: debug Creating new file info, data
>>> size = 2272072 num blocks = 3
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::inc_ref_cnt
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root,
>>> cnt at exit = 1
>>> 200708 13:38:46 240995 XrdPfc_File: debug File::AddIO() io = 0xe30f50
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::Attach()
>>> root:[log in to unmask]
>>> location: [log in to unmask]:1095
>>> [2020-07-08 13:38:47.022428 -0700][Error ][AsyncSock ]
>>> [[log in to unmask]:1095.0] Socket error encountered: [ERROR]
>>> Invalid arguments
>>> [2020-07-08 13:38:47.022506 -0700][Error ][XRootD ]
>>> [[log in to unmask]:1095] Unable to get the response to request
>>> kXR_read (handle: 0x00000000, offset: 0, size: 1048576)
>>> [2020-07-08 13:38:47.022625 -0700][Error ][File ]
>>> [0xf0b040@root:[log in to unmask]:1094//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root?xrdcl.requuid=6730e10b-8b40-43bf-9d0a-75da982939e8]
>>> Fatal file state error. Message kXR_read (handle: 0x00000000, offset: 0,
>>> size: 1048576) returned with [ERROR] Invalid arguments
>>> 200708 13:38:47 241052 XrdPfc_File: error File::ProcessBlockResponse block
>>> 0xff3440 0 error=-22
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> 200708 13:38:47 240995 XrdPfc_File: error File::Read() io 0xe30f50, block
>>> 0 finished with error 22 invalid argument
>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>> src/tcmalloc.cc:284] Attempt to free invalid pointer 0x313262003543620a
>>>
>>> Note that while we get all these network errors at the start, cache still
>>> got the stat info from the server (knows the size of the file).
>>>
>>> I must admit I never test xcache with auth on :( I'll try it out now,
>>> well, after lunch :)
>>>
>>> Matevz
>>>
>>>> Andy
>>>>
>>>>
>>>> On Wed, 8 Jul 2020, Matevz Tadel wrote:
>>>>
>>>>> Yay, that was a journey ... but I can reproduce it now!
>>>>>
>>>>> It is super strange this happens with xcache with authentication on only
>>>>> ...
>>>>> this really should have no effect. I first tried without it and it
>>>>> worked and
>>>>> then something rang a bell that you said so in the email :).
>>>>>
>>>>> Andy, does this ring any bells for you? It looks like interaction
>>>>> between
>>>>> server / client usage of X509 stuffe.
>>>>>
>>>>> Anyway, I'm digging on on the xcache side ...
>>>>>
>>>>> Cheers,
>>>>> Matevz
>>>>>
>>>>>
>>>>>
>>>>> On 2020-07-08 07:41, Nikolai Hartmann wrote:
>>>>>> Hi Matevz,
>>>>>>
>>>>>> I might have something like a "minimal failing example". Unfortunately
>>>>>> the problem only appears when authentication is required, so the
>>>>>> example
>>>>>> will only work on a machine that has a valid host certificate and the
>>>>>> corresponding directory has to be bind-mounted into the container.
>>>>>>
>>>>>> I uploaded my container image here:
>>>>>>
>>>>>> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$
>>>>>>
>>>>>>
>>>>>> and made a directory structure (tar archive attached) to bind mount
>>>>>> into
>>>>>> the container (and containing the minimal failing xcache config and a
>>>>>> script for starting gdb inside the container)
>>>>>>
>>>>>> To reproduce, extract the archive, enter the directory and run (as
>>>>>> non-root user)
>>>>>>
>>>>>> singularity run -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B
>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>>>>>>
>>>>>> where <hostkey-dir> is a directory that contains
>>>>>>
>>>>>> hostkey.pem
>>>>>> hostcert.pem
>>>>>> vomsdir (will become X509_VOMS_DIR)
>>>>>> certificates (will become X509_CERT_DIR)
>>>>>>
>>>>>> and <singularity-image> is the path to the singularity image.
>>>>>>
>>>>>> That should run xrootd and the log should appear in
>>>>>> data/xrd/var/log/xrootd.log
>>>>>>
>>>>>> I used this example to produce the failure:
>>>>>>
>>>>>> xrdcp -f
>>>>>> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>>>>>>
>>>>>> /dev/null
>>>>>>
>>>>>> The simplest way to run gdb seemed to directly start xrootd with gdb.
>>>>>> This can be done with the script run_xcache_debug.sh in the attached
>>>>>> archive. Instead of the command above just use
>>>>>>
>>>>>> singularity exec -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro
>>>>>> -B
>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>>>>>> ./run_xcache_debug.sh
>>>>>>
>>>>>> Note: Before restarting, best delete the content of the data directory
>>>>>> since the bug also did not seem to occur when the file was already
>>>>>> cached (e.g after testing without authentication)
>>>>>>
>>>>>> Sorry for the overly complicated reproducing steps, but since it only
>>>>>> happened when i authentication was enabled i didn't know how to do it
>>>>>> simpler. I hope it helps.
>>>>>>
>>>>>> Thanks,
>>>>>> Nikolai
>>>>>>
>>>>>> On 7/7/20 8:42 PM, Matevz Tadel wrote:
>>>>>>> Thanks Nikolai, I shall continue my investigation :)
>>>>>>>
>>>>>>> Matevz
>>>>>>>
>>>>>>> On 2020-07-06 23:59, Nikolai Hartmann wrote:
>>>>>>>> Hi Matevz,
>>>>>>>>
>>>>>>>> Thanks a lot for looking into this.
>>>>>>>>
>>>>>>>> - The crash seems to happen always when i make a request
>>>>>>>> - Currently prefetching is disabled
>>>>>>>> - Yes, i think it is direct proxy mode
>>>>>>>> - stack trace is attached
>>>>>>>>
>>>>>>>> A similar setup seems to work for Ilija without issues with the
>>>>>>>> xcaches
>>>>>>>> using slate - i tried to mimic that setup closely. Running xrootd
>>>>>>>> from
>>>>>>>> this container image:
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> using this config
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Nikolai
>>>>>>>>
>>>>>>>> On 7/7/20 1:38 AM, Matevz Tadel wrote:
>>>>>>>>> Hi Nikolai,
>>>>>>>>>
>>>>>>>>> I tried to reproduce it with current master in nearly all ways,
>>>>>>>>> with/without prefetching and with direct/forwarding mode. Also, with
>>>>>>>>> std
>>>>>>>>> malloc and tcmalloc. No luck :(
>>>>>>>>>
>>>>>>>>> Backtrace or core would help a lot at this point.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Matevz
>>>>>>>>>
>>>>>>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to
>>>>>>>>>> mitigate a
>>>>>>>>>> problem with dCache.
>>>>>>>>>>
>>>>>>>>>> Now when i try to read a file through xcache it crashes with
>>>>>>>>>> "Attempt to
>>>>>>>>>> free invalid pointer". I attached the corresponding part of the
>>>>>>>>>> log.
>>>>>>>>>> Any ideas?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Nikolai
>>>>>>>>>>
>>>>>>>>>> ########################################################################
>>>>>>>>>>
>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!XFGFw19U2eq-rA0gnMt46KV3Nmc-QzeRzIK6fXXO8cvBFHyGZUlCmc9OXbuf2OOR3Nx7$
>>>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|