Gentlemen, Since we now have the XrdClient code working with CLEO's infrastructure code, it was easy to run Purify on the code. It found a number of 'off the ends of array writes' (some reads) and a few memory leaks. Below I have included the output of Purify and my comments about the exact place of the problem. Chris Jones Cornell University ------------------------------------------------------------------------ ------------------------------------------------------------------------ ---- Purify ======================================================================== ====== ABW: Array bounds write: * This is occurring while in thread 6: memset [rtlib.o] bool XrdClientMessage::CreateData() [XrdClientMessage.cc:90] int XrdClientMessage::ReadRaw(XrdClientPhyConnection*) [XrdClientMessage.cc:160] XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool) [XrdClientMessage.hh:55] SocketReaderThread [XrdClientPhyConnection.cc:49] _thread_start [libthread.so.1] * Writing 1 byte to 0x21a9b5 in the heap. * Address 0x21a9b5 is 1 byte past end of a malloc'd block at 0x21a9b0 of 5 bytes. * This block was allocated from thread 6: malloc [rtlib.o] bool XrdClientMessage::CreateData() [XrdClientMessage.cc:83] int XrdClientMessage::ReadRaw(XrdClientPhyConnection*) [XrdClientMessage.cc:160] XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool) [XrdClientMessage.hh:55] SocketReaderThread [XrdClientPhyConnection.cc:49] _thread_start [libthread.so.1] **Comment** XrdClientMessage.cc lines 83 (where memory allocated) to 90 (where memset called) fData = malloc(fHdr.dlen+1); if (!fData) { Error("XrdClientMessage::CreateData","Fatal ERROR *** malloc failed." " Probable system resources exhausted."); abort(); } char *tmpPtr = (char *)fData; memset((void*)(tmpPtr+fHdr.dlen+1), 0, 1); memset is writing to 1 beyond the end. So line 90 should be changed to memset((void*)(tmpPtr+fHdr.dlen), 0, 1); or line 83 should be set to fData = malloc(fHdr.dlen+2); Purify ======================================================================== ====== ABR: Array bounds read: * This is occurring while in: strlen [rtlib.o] void ParseRedir(XrdClientMessage*,int&,XrdClientString&,XrdClientString&) [XrdClientString.hh:67] XrdClientConn::ESrvErrorHandlerRetval XrdClientConn:: HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*) [XrdClientConn.cc:1398] XrdClientMessage*XrdClientConn:: ReadPartialAnswer(XReqErrorType&,unsigned&,ClientRequest*,bool,void**,Xr dClientConn::EThreeStateReadHandler&) [XrdClientConn.cc:753] XrdClientMessage*XrdClientConn::ClientServerCmd(ClientRequest*,const void*,void**,void*,bool) [XrdClientConn.cc:238] bool XrdClientConn::SendGenCommand(ClientRequest*,const void*,void**,void*,bool,char*,ServerResponseHeader*) [XrdClientConn.cc:300] * Reading 24 bytes from 0x28ba5c in the heap (1 byte at 0x28ba73 illegal). * Address 0x28ba5c is 4 bytes into a malloc'd block at 0x28ba58 of 27 bytes. * This block was allocated from thread 6: malloc [rtlib.o] bool XrdClientMessage::CreateData() [XrdClientMessage.cc:83] int XrdClientMessage::ReadRaw(XrdClientPhyConnection*) [XrdClientMessage.cc:160] XrdClientMessage*XrdClientPhyConnection::BuildMessage(bool,bool) [XrdClientMessage.hh:55] SocketReaderThread [XrdClientPhyConnection.cc:49] _thread_start [libthread.so.1] **Comment** probably this comes from XrdClientConn.cc line 55 host = redirdata->host; where redirdata->host is not null terminated since it is obtained from the server and the server just says how long the buffer is. Purify ======================================================================== ====== ABW: Array bounds write: * This is occurring while in: bool XrdClientStringMatcher::SingleMatches(char*,char*) [XrdClientStringMatcher.cc:57] bool XrdClientStringMatcher::Matches(char*) [XrdClientStringMatcher.cc:123] bool XrdClientConn:: CheckHostDomain(XrdClientString,XrdClientString,XrdClientString) [XrdClientConn.cc:415] XrdClientConn::ESrvErrorHandlerRetval XrdClientConn:: HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*) [XrdClientConn.cc:1425] XrdClientMessage*XrdClientConn:: ReadPartialAnswer(XReqErrorType&,unsigned&,ClientRequest*,bool,void**,Xr dClientConn::EThreeStateReadHandler&) [XrdClientConn.cc:753] XrdClientMessage*XrdClientConn::ClientServerCmd(ClientRequest*,const void*,void**,void*,bool) [XrdClientConn.cc:238] * Writing 1 byte to 0x250c3f in the heap. * Address 0x250c3f is 1 byte before start of malloc'd block at 0x250c40 of 1 byte. * This block was allocated from: malloc [rtlib.o] strdup [libc.so.1] bool XrdClientStringMatcher::SingleMatches(char*,char*) [XrdClientStringMatcher.cc:50] bool XrdClientStringMatcher::Matches(char*) [XrdClientStringMatcher.cc:123] bool XrdClientConn:: CheckHostDomain(XrdClientString,XrdClientString,XrdClientString) [XrdClientConn.cc:415] XrdClientConn::ESrvErrorHandlerRetval XrdClientConn:: HandleServerError(XReqErrorType&,XrdClientMessage*,ClientRequest*) [XrdClientConn.cc:1425] **Comment** This time, Purify seems to be complaining about the line above which it is reporting (i.e. line 55) if (starend) plainexp[strlen(plainexp)-1] = '\0'; but the memory was allocated from if (starbeg) plainexp = strdup(expr+1); but strlen(expr) == 1 and therefore strdup created a string of length 0 (but of memory size 1 since it added the '\0'). So therefore strlen(plainexp)-1 == -1 and you write to 1 byte before the start of the string. Purify ======================================================================== ====== MLK: 11520 bytes leaked in 20 blocks * This memory was allocated from: malloc [rtlib.o] XrdOucHash<_pthread_cond>::XrdOucHash(int,int,int) [XrdOucHash.icc:36] XrdClientInputBuffer::XrdClientInputBuffer() [XrdClientVector.hh:34] XrdClientPhyConnection:: XrdClientPhyConnection(XrdClientAbsUnsolMsgHandler*) [XrdClientPhyConnection.cc:63] short XrdClientConnectionMgr::Connect(XrdClientUrlInfo) [XrdClientConnMgr.cc:315] short XrdClientConn::Connect(XrdClientUrlInfo) [XrdClientString.hh:85] * Block of 576 bytes (20 times); last block at 0x28ead0 MLK: 6400 bytes leaked in 20 blocks * This memory was allocated from: malloc [rtlib.o] c2n6Fi_Pv___1 [libCrun.so.1] void*operator new(unsigned) [rtlib.o] short XrdClientConnectionMgr::Connect(XrdClientUrlInfo) [XrdClientConnMgr.cc:315] short XrdClientConn::Connect(XrdClientUrlInfo) [XrdClientString.hh:85] XReqErrorType XrdClientConn::GoToAnotherServer(XrdClientUrlInfo) [XrdClientString.hh:85] * Block of 320 bytes (20 times); last block at 0x28bac8 [Plus other leaks that all come from XrdClientPhyConnection] **Comment** It looks like I am getting memory leaks for each connection (I only did 20 data reads which matches the '20 times' for the above leaks [why I'm getting a new connection each read I don't understand]). Although I don't believe purify's identification of line 315 in XrdClientConnMgr.cc, I do think the only new in that routine might be the leak phyconn = new XrdClientPhyConnection(this); there is definitely a leak at line 274 of that routine at the if ( phyconn && phyconn->Connect(RemoteServ) ) { //line 262 .... } else return -1; //leak if pyconn->Connect failed As for the other instances of XrdClientPhyConnection, they only go away if the GarbageCollection was correct. Perhaps the time out constraint on the GarbageCollection code is never met before the program stops (so it really isn't a bad leak) OR there is a problem with the GarbageCollection.