Print

Print


Hi Gregory,

I'm using KanCollUtil at babar2 and this works fine. The other command i'm
using in the batch (command is StmSubmitJobs from the
BbkSkimManager-Tools).

Cheers,

Miriam

*************************************************************************

Dr. Miriam Fritsch

Institut fuer Experimentalphysik I
Ruhr-Universitaet Bochum, Germany               email: [log in to unmask]
c/o SLAC                                        tel:  +1 (650) 926-3565
2575 Sand Hill Road #34                         fax:  +1 (650) 926-3882
Menlo Park, CA 94025, USA                       home: +1 (650) 324-2813

*************************************************************************

On Wed, 9 Feb 2005, Gregory Schott wrote:

> Hello Miriam,
>
> I am forwarding your mail to the xrootd mailing list.
>
> The problem Miriam has is that when she uses xrootd the request fails
> and the redirector babar2 is crashing.
>
> I have restarted xrootd (but didn't find the core although 'ulimit -c'
> says unlimited...) ?????
>
> Then I tried on babar and babar2:
> KanCopyUtil -r -n 20 /store/SP/R12/000998/200301/12.6.0b/SP_000998_011885
> and had no problem... (no crash)
>
> What command are you using? Are you running in batch? What I find strange
> is that I seem to be logged from a different machine. I tried to ssh
> babar2 then I ask csh -l and do the KanCopyUtil and in the xrd log I see
> "User logged in as schott.28977:[log in to unmask]" which seems to
> be babar.fzk.de!? but env | grep HOST confirms I am on babar2 ?????
>
> I see in the logfile for you:
>
> 050209 19:35:20 6055 XrootdXeq: User logged in as skimprod.27917:[log in to unmask]
> 050209 19:35:20 6055 odc_Locate: user=skimprod.27917:[log in to unmask] wait  5 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011887.01.root
> 050209 19:35:25 6055 odc_Locate: user=skimprod.27917:[log in to unmask] redirected to f01-001-118.gridka.de 1094 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011887.01.root
> 050209 19:35:26 6055 XrdLink: skimprod.27917:[log in to unmask] disconnected after 0:00:06
>
> and for me:
>
> 050209 20:04:05 31000 XrootdXeq: User logged in as schott.28977:[log in to unmask]
> 050209 20:04:05 31000 odc_Locate: user=schott.28977:[log in to unmask] wait  5 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011885.01.root
> 050209 20:04:10 31000 odc_Locate: user=schott.28977:[log in to unmask] redirected to f01-001-118.gridka.de 1094 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011885.01.root
> 050209 20:04:12 31000 odc_Locate: user=schott.28977:[log in to unmask] redirected to f01-001-118.gridka.de 1094 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011885.01.root
> 050209 20:04:14 31000 odc_Locate: user=schott.28977:[log in to unmask] wait  5 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011885.02E.root
> 050209 20:04:19 31000 odc_Locate: user=schott.28977:[log in to unmask] redirected to f01-001-118.gridka.de 1094 path=/store/SP/R12/000998/200301/12.6.0b/SP_000998_011885.02E.root
> 050209 20:04:23 31000 XrdLink: schott.28977:[log in to unmask] disconnected after 0:00:18
>
> and in the xrootd log of f01-001-118.gridka.de
>
> 050209 19:35:25 6912 XrootdXeq: User logged in as skimprod.27917:[log in to unmask]
> 050209 19:35:26 6912 XrdLink: skimprod.27917:[log in to unmask] disconnected after 0:00:01
> 050209 20:04:11 6912 XrootdXeq: User logged in as schott.28977:[log in to unmask]
> 050209 20:04:23 5594 XrdLink: schott.28977:[log in to unmask] disconnected after 0:00:12
>
> -- gregory
>
>
> On Wed, 9 Feb 2005, Miriam Fritsch wrote:
>
> >
> > Hi Gregory,
> >
> > I tried to read via xrootd again, but the xrootd-process crashs 15 sec
> > after the start of my job. The error message started in my logfile with
> > -------------------------------------------------------------------------
> > 2005-02-09 19:42:35 10574 SysError: TUnixSystem::UnixRecv          - recv
> > (Connection reset by peer)
> > 2005-02-09 19:42:35 10574 SysError: TUnixSystem::DispatchOneEvent  -
> > select: read error on 19
> > (Bad file descriptor)
> > 2005-02-09 19:43:36 10574 Err : TXSocket::RecvRaw              - Request
> > timed out 60 seconds reading 4 bytes from socket -1
> > (server[babar2.fzk.de:1094])
> > 2005-02-09 19:43:36 10574 Err : TXNetConn::DoHandShake         - Error
> > reading 4 bytes from the server [babar2.fzk.de:1094].
> > 2005-02-09 19:43:36 10574 Info: TXNetConn::GetAccessToSrv      - HandShake
> > failed with server [babar2.fzk.de:1094].
> > 2005-02-09 19:43:36 10574 Err : TXNetFile::CreateTXNf          - Access to
> > server failed
> > 2005-02-09 19:43:36 10574 Err : TXConnectionMgr::Disconnect    -
> > Destroying nonexistent logconnid 0.
> > 2005-02-09 19:43:46 10574 SysError: TUnixSystem::UnixTcpConnect    -
> > connect (babar2.fzk.de:1094) (Connection refused)
> > 2005-02-09 19:43:46 10574 Err : TXPhyConnection::Connect       - can't
> > open connection to xrootd/rootd on host [babar2.fzk.de:1094]
> > 2005-02-09 19:43:46 10574 Err : TXNetConn::TXNetFile           - Error
> > creating logical connection with [babar2.fzk.de:1094]
> > 2005-02-09 19:43:46 10574 Err : TXConnectionMgr::Disconnect    -
> > Destroying nonexistent logconnid -1.
> > .....
> > -------------------------------------------------------------------------
> >
> > I can read the collection with KanCollUtil, I've tested before.
> >
> > Cheers,
> >
> > Miriam
> >
> > *************************************************************************
> >
> > Dr. Miriam Fritsch
> >
> > Institut fuer Experimentalphysik I
> > Ruhr-Universitaet Bochum, Germany               email: [log in to unmask]
> > c/o SLAC                                        tel:  +1 (650) 926-3565
> > 2575 Sand Hill Road #34                         fax:  +1 (650) 926-3882
> > Menlo Park, CA 94025, USA                       home: +1 (650) 324-2813
> >
> > *************************************************************************
> >
> > On Wed, 9 Feb 2005, Gregory Schott wrote:
> >
> >>
> >>> Sorry, I have to run NFS, because I couldn't test reading and writing via
> >>> xrootd until now, because the collection were not available via xrootd
> >>> (configuration) or xrootd is not running (today). My opinion is that it is
> >>> more important to run the production than to wait until the tests with
> >>> xrootd are finished ...
> >>>
> >>> Until I run with NFS I need the diskspace for running skimming at
> >>> babar13+babar14. With xrootd later, I can (have to) use the nas boxes.
> >>
> >>
> >> Hello Miriam,
> >>
> >>    If you want to write with xrootd, I'll just need to know to which nas
> >> boxes you want to write to and I'll set them up with write access to /prod
> >> (right?). Do you also need write access to /store?
> >>    Presently, xrootd is only running for babar6-12 with read access only.
> >>
> >>    xrootd on the redirector babar2 was crashing 3 days ago when trying to
> >> use it but it seem to not crash anymore. I don't know yet why it crashed.
> >> Can you use it now for reading?
> >>
> >> -- Gregory
> >> ______________________________________________________________________
> >> You are monitoring a forum at savannah.fzk.de.
> >> To stop monitoring this forum, login to Savannah and visit:
> >> http://savannah.fzk.de/forum/monitor.php?forum_id=219
> >>
> >
>
> -------------- Dr. Gregory Schott --------------
>   Institut fuer Experimentelle Kernphysik (IEKP)
>       Universitaet Karlsruhe - Postfach 3640
>             76021 Karlsruhe  (Germany)
>              tel.: +49-(0)724782-3537
>              fax.: +49-(0)724782-3414
>             e-mail: [log in to unmask]
> -----------------------------------------------
>