Print

Print


Hi Matevz,

Well, what kind of authentication? Clearly, the kind we use doesn't cause 
this problem. It could be just a random core smash but if it's random we 
should be various effects not just a crash in this particular code path, 
right?

Andy


On Wed, 8 Jul 2020, Matevz Tadel wrote:

> Yay, that was a journey ... but I can reproduce it now!
>
> It is super strange this happens with xcache with authentication on only ... 
> this really should have no effect. I first tried without it and it worked and 
> then something rang a bell that you said so in the email :).
>
> Andy, does this ring any bells for you? It looks like interaction between 
> server / client usage of X509 stuffe.
>
> Anyway, I'm digging on on the xcache side ...
>
> Cheers,
> Matevz
>
>
>
> On 2020-07-08 07:41, Nikolai Hartmann wrote:
>> Hi Matevz,
>> 
>> I might have something like a "minimal failing example". Unfortunately
>> the problem only appears when authentication is required, so the example
>> will only work on a machine that has a valid host certificate and the
>> corresponding directory has to be bind-mounted into the container.
>> 
>> I uploaded my container image here:
>> 
>> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$
>> 
>> and made a directory structure (tar archive attached) to bind mount into
>> the container (and containing the minimal failing xcache config and a
>> script for starting gdb inside the container)
>> 
>> To reproduce, extract the archive, enter the directory and run (as
>> non-root user)
>> 
>> singularity run -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B
>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>> 
>> where <hostkey-dir> is a directory that contains
>> 
>> hostkey.pem
>> hostcert.pem
>> vomsdir (will become X509_VOMS_DIR)
>> certificates (will become X509_CERT_DIR)
>> 
>> and <singularity-image> is the path to the singularity image.
>> 
>> That should run xrootd and the log should appear in
>> data/xrd/var/log/xrootd.log
>> 
>> I used this example to produce the failure:
>> 
>> xrdcp -f
>> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root
>> /dev/null
>> 
>> The simplest way to run gdb seemed to directly start xrootd with gdb.
>> This can be done with the script run_xcache_debug.sh in the attached
>> archive. Instead of the command above just use
>> 
>> singularity exec -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B
>> <hostkey-dir>:/etc/grid-security:ro <singularity-image>
>> ./run_xcache_debug.sh
>> 
>> Note: Before restarting, best delete the content of the data directory
>> since the bug also did not seem to occur when the file was already
>> cached (e.g after testing without authentication)
>> 
>> Sorry for the overly complicated reproducing steps, but since it only
>> happened when i authentication was enabled i didn't know how to do it
>> simpler. I hope it helps.
>> 
>> Thanks,
>> Nikolai
>> 
>> On 7/7/20 8:42 PM, Matevz Tadel wrote:
>>> Thanks Nikolai, I shall continue my investigation :)
>>> 
>>> Matevz
>>> 
>>> On 2020-07-06 23:59, Nikolai Hartmann wrote:
>>>> Hi Matevz,
>>>> 
>>>> Thanks a lot for looking into this.
>>>> 
>>>> - The crash seems to happen always when i make a request
>>>> - Currently prefetching is disabled
>>>> - Yes, i think it is direct proxy mode
>>>> - stack trace is attached
>>>> 
>>>> A similar setup seems to work for Ilija without issues with the xcaches
>>>> using slate - i tried to mimic that setup closely. Running xrootd from
>>>> this container image:
>>>> 
>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$
>>>> 
>>>> 
>>>> using this config
>>>> 
>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$
>>>> 
>>>> 
>>>> Cheers,
>>>> Nikolai
>>>> 
>>>> On 7/7/20 1:38 AM, Matevz Tadel wrote:
>>>>> Hi Nikolai,
>>>>> 
>>>>> I tried to reproduce it with current master in nearly all ways,
>>>>> with/without prefetching and with direct/forwarding mode. Also, with std
>>>>> malloc and tcmalloc. No luck :(
>>>>> 
>>>>> Backtrace or core would help a lot at this point.
>>>>> 
>>>>> Cheers,
>>>>> Matevz
>>>>> 
>>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to
>>>>>> mitigate a
>>>>>> problem with dCache.
>>>>>> 
>>>>>> Now when i try to read a file through xcache it crashes with
>>>>>> "Attempt to
>>>>>> free invalid pointer". I attached the corresponding part of the log.
>>>>>> Any ideas?
>>>>>> 
>>>>>> Thanks,
>>>>>> Nikolai
>>>>>> 
>>>>>> ########################################################################
>>>>>> 
>>>>>> Use REPLY-ALL to reply to list
>>>>>> 
>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1