When using SSI streams, there are two environment variables which control the length of timeouts, `XRD_REQUESTTIMEOUT` and `XRD_STREAMTIMEOUT`. It is not clear how these two timeouts are supposed to interact and in some cases the behaviour is strange. Example setup: I have a simple XRootD SSI server which does round-robin requests. The client sends a simple payload and a number of records to return, and the server sends back the payload the requested number of times in an SSI stream. For this test, the server adds a 5-second delay to each chunk of the stream, to better observe how the timeouts interact. Testing with `test-client stream 1000`, which returns 1000 records in a stream. The requested buffer size is 1024 bytes and the payload is in the order of 100 bytes, so we expect ~10 chunks to be returned on the stream. As we added a 5 sec delay to each chunk, the total processing time is around 50s. **CASE 1. XRD_REQUESTTIMEOUT=1, XRD_STREAMTIMEOUT=2** ``` $ time test/src/test-client stream 1000 180925 11:41:19 22835 ssi_Pb::ServiceClientSide: pid:22835 tid:140317381859456 ServiceClientSide constructor: Request timeout = 1s 180925 11:41:19 22835 ssi_Pb::IStreamBuffer: pid:22835 tid:140317381859456 Called IStreamBuffer() constructor 180925 11:41:19 22835 ssi_Pb::Request: pid:22835 tid:140317381859456 Request() constructor: Response buffer size = 1024 bytes, request timeout = 1s 180925 11:41:19 22843 ssi_Pb::Request: pid:22835 tid:140317244593920 ProcessResponse(): response type = isStream 180925 11:41:19 22843 ssi_Pb::Request: pid:22835 tid:140317244593920 ProcessResponseMetadata(): received 74 bytes Count Int64 Value Double Value Bool String Value 180925 11:41:20 22842 ssi_Pb::Request: pid:22835 tid:140317252986624 ProcessResponseData(): received -1 bytes terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException' what(): [ERROR] Operation expired Aborted real 0m1.022s ``` In this case, the server returns the stream handle within the 1s, but it is still the REQUEST timeout which triggers, not the STREAM timeout. **CASE 2. XRD_REQUESTTIMEOUT=3, XRD_STREAMTIMEOUT=2** Same as above, server responds in time, but the REQUEST timeout is triggered, not the STREAM timeout. **CASE 3. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=2** ``` $ time test/src/test-client stream 1000 180925 11:45:03 24983 ssi_Pb::ServiceClientSide: pid:24983 tid:140377820215424 ServiceClientSide constructor: Request timeout = 6s 180925 11:45:03 24983 ssi_Pb::IStreamBuffer: pid:24983 tid:140377820215424 Called IStreamBuffer() constructor 180925 11:45:03 24983 ssi_Pb::Request: pid:24983 tid:140377820215424 Request() constructor: Response buffer size = 1024 bytes, request timeout = 6s 180925 11:45:03 24990 ssi_Pb::Request: pid:24983 tid:140377691342592 ProcessResponse(): response type = isStream 180925 11:45:03 24990 ssi_Pb::Request: pid:24983 tid:140377691342592 ProcessResponseMetadata(): received 74 bytes Count Int64 Value Double Value Bool String Value 180925 11:45:05 24991 ssi_Pb::Request: pid:24983 tid:140377682949888 ProcessResponseData(): received -1 bytes terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException' what(): [ERROR] Socket error Aborted real 0m2.025s ``` This time the STREAM timeout is triggered, and the error is `Socket error` instead of `Operation expired`. **CASE 4. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=7** ``` $ time test/src/test-client stream 1000 180925 11:27:07 14169 ssi_Pb::ServiceClientSide: pid:14169 tid:139848111949952 ServiceClientSide constructor: Request timeout = 10s 180925 11:27:07 14169 ssi_Pb::IStreamBuffer: pid:14169 tid:139848111949952 Called IStreamBuffer() constructor 180925 11:27:07 14169 ssi_Pb::Request: pid:14169 tid:139848111949952 Request() constructor: Response buffer size = 1024 bytes, request timeout = 10s 180925 11:27:07 14177 ssi_Pb::Request: pid:14169 tid:139847983077120 ProcessResponse(): response type = isStream 180925 11:27:07 14177 ssi_Pb::Request: pid:14169 tid:139847983077120 ProcessResponseMetadata(): received 74 bytes Count Int64 Value Double Value Bool String Value 180925 11:27:12 14178 ssi_Pb::Request: pid:14169 tid:139847974684416 ProcessResponseData(): received 1024 bytes 1 0 0 false 2 0 0 false 3 0 0 false 4 0 0 false 5 0 0 false 6 0 0 false ... 920 0 0 false 921 0 0 false 922 0 0 false 923 0 0 false 924 0 0 false 180925 11:49:45 27486 ssi_Pb::Request: pid:27472 tid:140401844549376 ProcessResponseData(): request more response data 180925 11:49:51 27487 ssi_Pb::Request: pid:27472 tid:140401836156672 ProcessResponseData(): received -1 bytes terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException' what(): [ERROR] Operation expired Aborted real 0m46.033s ``` In this case, the client receives 9 of the 10 chunks, but times out on the last one with `Operation expired`. Here are the server-side logs for this case: ``` 180925 11:49:05 1055 XrootdXeq: ctadev.27472:22@localhost pvt IP64 login 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Prepare(): 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Resource name: /test 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Resource user: 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Resource info: 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Hosts to avoid: 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Affinity: Default 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Resource options: 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Resource Client: 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Protocol: host 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Name: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Host: localhost 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Vorg: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Role: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Grps: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Endorsements: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Creds: NULL 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Credslen: 0 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Rsvd: 0 180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Tident: 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_open: /test prepared. 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_write: 0:/test rsz=5 wsz=5 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_Activate: 0:/test [new wtReq] oucbuff rqsz=5 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_DoIt: 0:/test [begun xqReq] Calling service processor 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_BindDone: 0:/test [begun xqReq] Bind called; for request 0 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_GetRequest: 0:/test [bound xqReq] sz=5 180925 11:49:05 30749 ssi_Pb::RequestProc: pid:30671 tid:140715575441152 RequestProc(): received 5 bytes 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_RelReqBuff: 0:/test [bound xqReq] called 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_ProcessResponse: 0:/test [bound xqReq] Response presented wtr=False 180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_ProcessResponse: 0:/test [bound doRsp] Resp strm 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_fctl: 0:/test query resp status 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_AttnInfo: 0:/test 74 byte metadata (0x080112461b5b33...) sent. 180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_fctl: 0:/test resp ready 180925 11:49:10 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1024 bytes) 180925 11:49:10 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1025 bytes of data. 180925 11:49:15 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1023 bytes) 180925 11:49:15 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:20 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1021 bytes) 180925 11:49:20 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:25 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1019 bytes) 180925 11:49:25 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:30 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1017 bytes) 180925 11:49:30 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1017 bytes of data. 180925 11:49:35 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1024 bytes) 180925 11:49:35 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:40 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1022 bytes) 180925 11:49:40 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:45 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1020 bytes) 180925 11:49:45 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data. 180925 11:49:50 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1018 bytes) 180925 11:49:50 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 675 bytes of data. 180925 11:49:55 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (343 bytes) 180925 11:49:55 1055 ctadev.27472:22@localhost Ssi_Finalize: 0:/test [bound odRsp] Calling Finished(False) 180925 11:49:55 30749 ctadev.27472:22@localhost Ssi_Dispose: 0:/test [bound odRsp] Recycling request... 180925 11:49:55 30676 XrdPoll: Sever event occured for ctadev.27472:22@localhost 180925 11:49:55 1055 XrdLink: Unable to send to ctadev.27472:22@localhost; broken pipe 180925 11:49:55 1055 XrootdXeq: ctadev.27472:22@localhost disc 0:00:50 (send failure) 180925 11:49:55 1055 ctadev.27472:22@localhost Ssi_close: /test del=True ``` **CASE 5. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=60** This gives exactly the same behaviour as above. **CASE 6. XRD_REQUESTTIMEOUT=15, XRD_STREAMTIMEOUT=6** This is the same as above, except it gives a `Socket error` instead of `Operation expired`. **CASE 7. XRD_REQUESTTIMEOUT=11, XRD_STREAMTIMEOUT=60** Finally, success. **SUMMARY** The behaviour of XRD_REQUESTTIMEOUT and XRD_STREAMTIMEOUT and their interaction is not documented, so I am trying to figure out how it should work by trial and error. What I was expecting is that XRD_REQUESTTIMEOUT would place a limit on the time to receive the stream handle from the server, and XRD_STREAMTIMEOUT would place a limit on the time to receive each chunk of the stream. But it seems this is not the case. It looks like XRD_REQUESTTIMEOUT is used for two distinct purposes, one for the time to receive the stream handle, and also for the time to receive each chunk of the stream. Also XRD_REQUESTTIMEOUT needs to be set to at least double the time taken to process one chunk, as the final chunk of the stream requires two calls to the stream object (one to partially fill the buffer with the last remaining data, then again to set `last = true` and return `nullptr`). XRD_STREAMTIMEOUT is used to control the total amount of time to receive the entire stream, and there is no way to independently control the timeout on each chunk of the stream. Please can you confirm that this is the intended behaviour of these timeouts? XRD_STREAMTIMEOUT is not so useful in its current form. We need a way to limit the time taken to process each chunk of the stream, not the total time. Also we need to be able to control the time taken to process each chunk of the stream independently from the request timeout. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/829 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1