Print

Print


Gentlemen,
	Since we now have the XrdClient code working with CLEO's  
infrastructure code, it was easy to run Purify on the code.  It found a  
number of 'off the ends of array writes' (some reads) and a few memory  
leaks.  Below I have included the output of Purify and my comments  
about the exact place of the problem.

	Chris Jones
	Cornell University

------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
----

Purify  
======================================================================== 
======
ABW: Array bounds write:
   * This is occurring while in thread 6:
         memset         [rtlib.o]
         bool XrdClientMessage::CreateData() [XrdClientMessage.cc:90]
         int XrdClientMessage::ReadRaw(XrdClientPhyConnection*)  
[XrdClientMessage.cc:160]
          
XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool)  
[XrdClientMessage.hh:55]
         SocketReaderThread [XrdClientPhyConnection.cc:49]
         _thread_start  [libthread.so.1]
   * Writing 1 byte to 0x21a9b5 in the heap.
   * Address 0x21a9b5 is 1 byte past end of a malloc'd block at 0x21a9b0  
of 5 bytes.
   * This block was allocated from thread 6:
         malloc         [rtlib.o]
         bool XrdClientMessage::CreateData() [XrdClientMessage.cc:83]
         int XrdClientMessage::ReadRaw(XrdClientPhyConnection*)  
[XrdClientMessage.cc:160]
          
XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool)  
[XrdClientMessage.hh:55]
         SocketReaderThread [XrdClientPhyConnection.cc:49]
         _thread_start  [libthread.so.1]

**Comment**
XrdClientMessage.cc lines 83 (where memory allocated) to 90 (where  
memset called)

          fData = malloc(fHdr.dlen+1);
          if (!fData) {
             Error("XrdClientMessage::CreateData","Fatal ERROR ***  
malloc failed."
                   " Probable system resources exhausted.");
             abort();
          }
          char *tmpPtr = (char *)fData;
          memset((void*)(tmpPtr+fHdr.dlen+1), 0, 1);

memset is writing to 1 beyond the end.  So line 90 should be changed to
          memset((void*)(tmpPtr+fHdr.dlen), 0, 1);
or line 83 should be set to
          fData = malloc(fHdr.dlen+2);



Purify  
======================================================================== 
======
ABR: Array bounds read:
   * This is occurring while in:
         strlen         [rtlib.o]
         void  
ParseRedir(XrdClientMessage*,int&,XrdClientString&,XrdClientString&)  
[XrdClientString.hh:67]
         XrdClientConn::ESrvErrorHandlerRetval  
XrdClientConn:: 
HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*)  
[XrdClientConn.cc:1398]
          
XrdClientMessage*XrdClientConn:: 
ReadPartialAnswer(XReqErrorType&,unsigned&,ClientRequest*,bool,void**,Xr 
dClientConn::EThreeStateReadHandler&) [XrdClientConn.cc:753]
          
XrdClientMessage*XrdClientConn::ClientServerCmd(ClientRequest*,const  
void*,void**,void*,bool) [XrdClientConn.cc:238]
         bool XrdClientConn::SendGenCommand(ClientRequest*,const  
void*,void**,void*,bool,char*,ServerResponseHeader*)  
[XrdClientConn.cc:300]
   * Reading 24 bytes from 0x28ba5c in the heap (1 byte at 0x28ba73  
illegal).
   * Address 0x28ba5c is 4 bytes into a malloc'd block at 0x28ba58 of 27  
bytes.
   * This block was allocated from thread 6:
         malloc         [rtlib.o]
         bool XrdClientMessage::CreateData() [XrdClientMessage.cc:83]
         int XrdClientMessage::ReadRaw(XrdClientPhyConnection*)  
[XrdClientMessage.cc:160]
          
XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool)  
[XrdClientMessage.hh:55]
         SocketReaderThread [XrdClientPhyConnection.cc:49]
         _thread_start  [libthread.so.1]


**Comment**

probably this comes from XrdClientConn.cc line 55
       host = redirdata->host;
where redirdata->host is not null terminated since it is obtained from  
the server and the server just says how long the buffer is.

Purify  
======================================================================== 
======
ABW: Array bounds write:
   * This is occurring while in:
         bool XrdClientStringMatcher::SingleMatches(char*,char*)  
[XrdClientStringMatcher.cc:57]
         bool XrdClientStringMatcher::Matches(char*)  
[XrdClientStringMatcher.cc:123]
         bool  
XrdClientConn:: 
CheckHostDomain(XrdClientString,XrdClientString,XrdClientString)  
[XrdClientConn.cc:415]
         XrdClientConn::ESrvErrorHandlerRetval  
XrdClientConn:: 
HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*)  
[XrdClientConn.cc:1425]
          
XrdClientMessage*XrdClientConn:: 
ReadPartialAnswer(XReqErrorType&,unsigned&,ClientRequest*,bool,void**,Xr 
dClientConn::EThreeStateReadHandler&) [XrdClientConn.cc:753]
          
XrdClientMessage*XrdClientConn::ClientServerCmd(ClientRequest*,const  
void*,void**,void*,bool) [XrdClientConn.cc:238]
   * Writing 1 byte to 0x250c3f in the heap.
   * Address 0x250c3f is 1 byte before start of malloc'd block at  
0x250c40 of 1 byte.
   * This block was allocated from:
         malloc         [rtlib.o]
         strdup         [libc.so.1]
         bool XrdClientStringMatcher::SingleMatches(char*,char*)  
[XrdClientStringMatcher.cc:50]
         bool XrdClientStringMatcher::Matches(char*)  
[XrdClientStringMatcher.cc:123]
         bool  
XrdClientConn:: 
CheckHostDomain(XrdClientString,XrdClientString,XrdClientString)  
[XrdClientConn.cc:415]
         XrdClientConn::ESrvErrorHandlerRetval  
XrdClientConn:: 
HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*)  
[XrdClientConn.cc:1425]

**Comment**

This time, Purify seems to be complaining about the line above which it  
is reporting (i.e. line 55)

    if (starend)
       plainexp[strlen(plainexp)-1] = '\0';

but the memory was allocated from

    if (starbeg)
       plainexp = strdup(expr+1);

but strlen(expr) == 1 and therefore strdup created a string of length 0  
(but of memory size 1 since it added the '\0').  So therefore  
strlen(plainexp)-1 == -1 and you write to 1 byte before the start of  
the string.





Purify  
======================================================================== 
======

MLK: 11520 bytes leaked in 20 blocks
   * This memory was allocated from:
         malloc         [rtlib.o]
         XrdOucHash<_pthread_cond>::XrdOucHash(int,int,int)  
[XrdOucHash.icc:36]
         XrdClientInputBuffer::XrdClientInputBuffer()  
[XrdClientVector.hh:34]
          
XrdClientPhyConnection:: 
XrdClientPhyConnection(XrdClientAbsUnsolMsgHandler*)  
[XrdClientPhyConnection.cc:63]
         short XrdClientConnectionMgr::Connect(XrdClientUrlInfo)  
[XrdClientConnMgr.cc:315]
         short XrdClientConn::Connect(XrdClientUrlInfo)  
[XrdClientString.hh:85]
   * Block of 576 bytes (20 times); last block at 0x28ead0

MLK: 6400 bytes leaked in 20 blocks
   * This memory was allocated from:
         malloc         [rtlib.o]
         c2n6Fi_Pv___1  [libCrun.so.1]
         void*operator new(unsigned) [rtlib.o]
         short XrdClientConnectionMgr::Connect(XrdClientUrlInfo)  
[XrdClientConnMgr.cc:315]
         short XrdClientConn::Connect(XrdClientUrlInfo)  
[XrdClientString.hh:85]
         XReqErrorType  
XrdClientConn::GoToAnotherServer(XrdClientUrlInfo)  
[XrdClientString.hh:85]
   * Block of 320 bytes (20 times); last block at 0x28bac8
[Plus other leaks that all come from XrdClientPhyConnection]

**Comment**

It looks like I am getting memory leaks for each connection (I only did  
20 data reads  which matches the '20 times' for the above leaks [why  
I'm getting a new connection each read I don't understand]).  Although  
I don't believe purify's identification of line 315 in  
XrdClientConnMgr.cc,  I do think the only new in that routine might be  
the leak


       phyconn = new XrdClientPhyConnection(this);

there is definitely a leak at line 274 of that routine at the

       if ( phyconn && phyconn->Connect(RemoteServ) ) { //line 262
	....
       } else
          return -1; //leak if pyconn->Connect failed


As for the other instances of XrdClientPhyConnection, they only go away  
if the GarbageCollection was correct.  Perhaps the time out constraint  
on the GarbageCollection code is never met before the program stops (so  
it really isn't a bad leak) OR there is a problem with the  
GarbageCollection.