Print

Print


I believe the ioctl I found back then is called FIONREAD. You could
actually check how many sockets you have and how much data is available
for reading and cheat by introducing a short seep in the poller
depending on those parameters. You will either fix things or disprove
my theory ;)

   Lukasz

On Sun, 17 May 2015 23:25:01 +0200
Lukasz Janyst <[log in to unmask]> wrote:

> Somebody should probably have a look at this. Forwarding for
> documentation purposes.
> 
>    Lukasz
> 
> Begin forwarded message:
> 
> Date: Sun, 17 May 2015 23:19:47 +0200
> From: Lukasz Janyst <[log in to unmask]>
> To: Andreas-Joachim Peters <[log in to unmask]>
> Subject: Re: 10G Question
> 
> 
> Hi Andreas,
> 
>    this was not fixed, but from what I could tell this might actually
> have been related to the particular version of kernel/driver.
> 
>    The situation is as follows:
> 
> 1) If you only have one active socket on a 10G interface you get only
> around 50% of bandwidth utilisation.
> 2) If you have two or more active sockets you saturate the link.
> 
>    Now, from my observation/research:
> 
> 1) Socket watermarking (SO_RCVLOWAT sockopt and friends) is not
> implemented/broken in Linux.
> 2) When a tiny bit of data arrives in the NIC buffer, the TCP stack
> triggers a poll event for the descriptor. This results in a switch to
> user space.
> 3) This tiny bit of data is then retrieved from the descriptor using a
> read call on a non-blocking socket (another user-to-kernel-space
> switch and back)
> 4) Another read request is issued (another trip to kernel space and
> back) on the socket witch, this time, resutls with EWOULDBLOC errno
> being set, and the client logic deciding to go back to the poll
> syscall, which quickly gets triggered again and we're back at 2. The
> cycle repeats itself and we're spending all of the time doing useless
> context switches witch could be easily prevented if SO_RCVLOWAT
> actually worked.
> 
>    For two or more sockets, the timing changes and the poll events are
> triggered when larger chunks of data have been accumulated. The more
> sockets/connections we have the better it gets.
> 
>    I vaguely recall that there was an ioctl option controlling this,
> but I cannot find it now. Alternatively, you can have a look at the
> driver or implement proper socket watermarking in the TCP stack of
> Linux :)
> 
>    I hope this helps.
> 
> Cheers,
>    Lukasz
> 
> On Wed, 13 May 2015 16:55:48 +0200
> Andreas-Joachim Peters <[log in to unmask]> wrote:
> 
> > Hi Lukasz,
> > 
> > in case you remember something quickly.
> > 
> > We have some problems with the performance of copy commands over
> > 10G.
> > 
> > If I compare the iperf performance with copy performance of xrootd I
> > get only 50% with xrootd.
> > 
> > I played with the chunksize etc. but with the 3.3.6 versions it
> > seems not to change much.
> > 
> > Do remember anything particular that had been fixed only with the
> > 4.x version concerning window scaling etc. ??
> > 
> > I hope you are all well.
> > 
> > Cheers Andreas.
> 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1