Print

Print


Posted by: Lukasz Janyst <ljanyst>
Related to: [ROOT bugs #87880] wrong TBasket when reading via xrootd
URL: <http://savannah.cern.ch/bugs/?87880>

Follow-up Comment:

It's the client.

Submitted by: clemencic
Originator Email: 
Bug / Feature: Bug report
Category: 
Priority: 5 - Normal
Severity: 5 - Blocker
Status: Fixed
Privacy: Public
Assigned to: ljanyst
Open/Closed: Closed
Release: 5.30/00
Discussion Lock: 
Operating System: GNU/Linux

-----Reply from Marco Clemencic <clemencic> on 2011-11-03 12:51
(Europe/Warsaw)-----
Hi Lukasz,

thanks a lot for the fix.

It's not too clear to me if the problem is on the client or server side.
In any case, how can we get the fix deployed?

Thanks
Marco

-----Reply from Lukasz Janyst <ljanyst> on 2011-11-03 12:34
(Europe/Warsaw)-----
XRootD has some limits on the lengths and number of chunks per vector read.
If one of those limits is exceeded then the XrdClient::ReadV method will try
to split the chunks themselves and send them in many actual readv requests if
necessary.

This is what happened in this case:
* the number of chunks after splitting them not to exceed max chunk size was
513
* maximum number of chunks per readv is 512, so the client issued one readv
and one read request
* the readv request was unpacked correctly into the buffer supplied to the
client but the read request didn't take into account that a readv request was
unpacked before overwriting the beginning of the buffer

http://xrootd.cern.ch/cgi-bin/cgit.cgi/xrootd/commit/?id=d10c528900539891037566b5d26c26be1c662132

-----Reply from Lukasz Janyst <ljanyst> on 2011-11-03 10:26
(Europe/Warsaw)-----
It's a bug in the vector read algorithm of xrootd. The attached file
reproduces the issue in terms of pure xroot api.

(file #22190, file #22191)

-----Reply from Marco Cattaneo <cattanem> on 2011-11-02 16:45
(Europe/Warsaw)-----
To comment#8
Yes, that is also what we observe: we have circumstantial evidence that
switching off the cache circumvents the problem.



-----Reply from Lukasz Janyst <ljanyst> on 2011-11-02 16:32
(Europe/Warsaw)-----
On the other hand with TTreeCache disabled all three protocols come out with
the right thing.

-----Reply from Lukasz Janyst <ljanyst> on 2011-11-02 15:50
(Europe/Warsaw)-----
rfio:// is consistent to castor:// but different than root:// sorry for the
mistake.

-----Reply from Lukasz Janyst <ljanyst> on 2011-11-02 15:48
(Europe/Warsaw)-----
I have hooked in the crc32 calculation for this particular buffer and it
indeed comes different for the two protocols. I have also checked rfio:// and
get the reading consistent to the access through root://. This confirms that
there is a problem on xrootd side, either in TXNetFile or xrootd itself.

-----Reply from Lukasz Janyst <ljanyst> on 2011-11-02 14:49
(Europe/Warsaw)-----
I am looking into this and will let you know as soon as I know something.

-----Reply from Marco Cattaneo <cattanem> on 2011-11-02 14:35
(Europe/Warsaw)-----
Is there any progress with this? We need a fix....

-----Reply from Marco Clemencic <clemencic> on 2011-10-28 22:07
(Europe/Warsaw)-----
Hi Lukasz,

I do not know exactly the details, but you can find it here:

https://svnweb.cern.ch/trac/lhcb/browser/Online/trunk/Online/RootCnv

If you need something else, let me know.

Thanks
Marco

-----Reply from Lukasz Janyst <ljanyst> on 2011-10-24 16:39
(Europe/Warsaw)-----
Marco, can you point me to the place where you keep the code interacting with
ROOT? I meant the part where the event selector touches ROOT API.

-----Reply from Lukasz Janyst <ljanyst> on 2011-10-24 14:45
(Europe/Warsaw)-----
I can reproduce the problem and will be looking at it.

Lukasz

  -----Original Message-----
Hi,

we have a problem (most probably) with xrootd (bug #87105).

In some cases we get strange warnings of the type

Warning in <TBasket::ReadBasketBuffers>:
basket:_Event_Bhadron_Phys_B02DstDstKSDDBeauty2CharmLine_Particle2VertexRelations.DataObject.m_version
has fNevBuf=26400 but fEntryOffset=0, pos=509266101, len=26563, fNbytes=311,
fObjlen=26400, trying to repair


The problem doesn't occur if we use rootd instead of xrootd.

At least in the case case I've been testing, it turns out that we are getting
the wrong basket.

I crafted a minimal example (attached) that exposes the problem, showing the
different behavior between "root://" and "castor://" PFNs.
It reads few events and it prints the branch name for a good event and for a
bad event.

The output you should get is:

$ ./run_me.sh 
... Prepare the environment
ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.30.00/x86_64-slc5-gcc43-dbg/root


****** Using xrootd (PFN:root://) ******
No source file named TBranch.cxx.
warning: no loadable sections found in added symbol-file system-supplied DSO
at 0x2aaaaaac7000