Print

Print


Hi Catalin,

OK, I will see why the flush doesn't redrive itself (it's supposed to). As
for self-consistency, yes there is a problem with lost datagrams. In most
cases, that will not happen in practice on the local area network if the
collector is well matched to the systems being monitored. We chose udp
specifically for it's ability to drop packets when the system becomes
overloaded. Not doing so would eventually lead to a melt-down of the
complete system. It's a disputable trade-off.

Andy

On Thu, 28 Apr 2005, Catalin Cirstoiu wrote:

> Hi Andy,
>
> I modified the configuration of the xrootd dataserver and removed the io
> parameter from xrood.monitor:
>
> xrootd.monitor all flush 30s window 15s dest files info user
> pcardaab.cern.ch:9930
>
> Now it seems that the window is ok and the flush rule works, but it
> works only once! I get a single trace datagram, no matter how many
> xdrcp's I run afterwards - I will get only the map datagrams. I will get
> another trace datagram only after restarting xrootd.
>
> As a remark, about the documentation, you say there that the datagrams
> are self-consistent. This is true in regard to the time from windows;
> however, this is not true for dictid mappings. If for instance a map
> datagram is lost, you can do nothing with the trace that refers to it.
>
> Catalin.
>
> Andy Hanushevsky wrote:
> > Hi Catalin,
> >
> >> I modified a little the configuration so that the flush is done after
> >> 30s and the window is 15s to show better the points 1) and 2).
> >
> > OK, Now, I see your point. The flush applies only to non-I/O monitoring
> > data (i.e., everything but actual read/write calls). A stream that
> > includes I/O events automatically bdsables the flush feature as well.
> > Typically, people will segeragate the streams using the summary data for
> > real-time reporting and I/O event data for application analysis. We did
> > this because we cannot provide a flush feature for detailed I/O
> > information. To do so would bog down the server to the point of making
> > not only the flushing useless but also the server. We tried, it doesn't
> > scale. Well, at leaat now I know that the documentation is not clear on
> > this point.
> >
> >> I also attached the logs and binary dumps I obtained from my ML module.
> >
> > Thanks, I'll look at them.
> >
> > Andy
>