


                 Summary: odd transcient behaviour on google compute
element(gce) storage cluster
                 Project: XROOTD
            Submitted by: bdouglas
            Submitted on: 2013-01-07 12:46
             Report Type: Bug
                Priority: 5 - Normal
                Severity: 3 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
      Fixed by commit(s): 




I have setup an xrootd storage cluster in the google cloud. (gce)
and have seen this odd behaviour.

I have successfully copied files into the storage but when I go to locate the
files sometimes I see them and some times I do not.

For example:

root://headnode.c.atlasgce.internal:1094//> dirlist /atlas/local/benjamin/
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-15
-rw-(048)    104857600 2012-12-22 23:19:36
-rw-(048)    104857600 2012-12-22 23:19:23
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-05
-rw-(048)    104857600 2012-12-22 23:19:43
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-20
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-19
-rw-(048)    104857600 2012-12-22 23:19:42
-rw-(048)    104857600 2012-12-22 23:19:40
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-18
-rw-(048)    104857600 2012-12-22 23:19:39
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-17
-rw-(048)    104857600 2012-12-22 23:19:37
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-16
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-14
-rw-(048)    104857600 2012-12-22 23:19:35
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-13
-rw-(048)    104857600 2012-12-22 23:19:33
-rw-(048)    104857600 2012-12-22 23:19:32
drwx(051)         4096 2013-01-07 11:35:09
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-12
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-11
drwx(051)         4096 2013-01-07 11:34:25
-rw-(048)    104857600 2012-12-22 23:19:31
drwx(051)         4096 2013-01-07 11:33:34
-rw-(048)    104857600 2012-12-22 23:19:29
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-10
drwx(051)         4096 2013-01-07 11:33:12
-rw-(048)    104857600 2012-12-22 23:19:28
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-09
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-04
-rw-(048)    104857600 2012-12-22 23:19:21
drwx(051)         4096 2013-01-07 11:32:21
drwx(051)         4096 2013-01-07 11:31:35
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-07
-rw-(048)    104857600 2012-12-22 23:19:25
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-03
-rw-(048)    104857600 2012-12-22 23:19:20
drwx(051)         4096 2013-01-07 11:31:25
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-08
-rw-(048)    104857600 2012-12-22 23:19:27
drwx(051)         4096 2013-01-07 11:30:39
drwx(051)         4096 2013-01-07 11:29:53
-rw-(048)    104857600 2012-12-22 23:19:24
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-06
drwx(051)         4096 2013-01-07 11:29:06
-rw-(048)    104857600 2012-12-22 23:19:19
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-02
-rw-(048)    104857600 2012-12-22 23:19:17
drwx(051)         4096 2013-01-07 11:27:23
drwx(051)         4096 2012-12-22 23:11:53 /atlas/local/benjamin/dpb-test-01
-rw-(048)    104857600 2012-12-22 23:19:16
drwx(051)         4096 2013-01-07 05:41:03
drwx(051)         4096 2012-12-22 23:12:12
drwx(051)         4096 2012-12-22 23:12:01 /atlas/local/benjamin/dpb-test-00
-rw-(048)    104857600 2012-12-22 23:19:14
-rw-(048)           10 2013-01-07 05:13:32

Yet a short time late (after the help command in xrd)

root://headnode.c.atlasgce.internal:1094//> dirlistrec
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-15
-rw-(048)    104857600 2012-12-22 23:19:36
-rw-(048)    104857600 2012-12-22 23:19:23
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-05
-rw-(048)    104857600 2012-12-22 23:19:43
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-20
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-19
-rw-(048)    104857600 2012-12-22 23:19:42
-rw-(048)    104857600 2012-12-22 23:19:40
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-18
-rw-(048)    104857600 2012-12-22 23:19:39
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-17
-rw-(048)    104857600 2012-12-22 23:19:37
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-16
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-14
-rw-(048)    104857600 2012-12-22 23:19:35
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-13
-rw-(048)    104857600 2012-12-22 23:19:33
-rw-(048)    104857600 2012-12-22 23:19:32
drwx(051)         4096 2013-01-07 11:35:09
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-12
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-11
-rw-(048)    104857600 2012-12-22 23:19:31
-rw-(048)    104857600 2012-12-22 23:19:29
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-10
-rw-(048)    104857600 2012-12-22 23:19:28
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-09
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-04
-rw-(048)    104857600 2012-12-22 23:19:21
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-07
-rw-(048)    104857600 2012-12-22 23:19:25
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-03
-rw-(048)    104857600 2012-12-22 23:19:20
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-08
-rw-(048)    104857600 2012-12-22 23:19:27
-rw-(048)    104857600 2012-12-22 23:19:24
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-06
-rw-(048)    104857600 2012-12-22 23:19:19
drwx(051)         4096 2012-12-22 23:13:32 /atlas/local/benjamin/dpb-test-02
-rw-(048)    104857600 2012-12-22 23:19:17
drwx(051)         4096 2012-12-22 23:11:53 /atlas/local/benjamin/dpb-test-01
-rw-(048)    104857600 2012-12-22 23:19:16
drwx(051)         4096 2013-01-07 05:41:03
drwx(051)         4096 2012-12-22 23:12:12
drwx(051)         4096 2012-12-22 23:12:01 /atlas/local/benjamin/dpb-test-00
-rw-(048)    104857600 2012-12-22 23:19:14
-rw-(048)           10 2013-01-07 05:13:32
Error 3011: Unable to open directory /atlas/local/benjamin/dpb-test-00-clone;
No such file or directory

In server headnode.c.atlasgce.internal:1094 or in some of its child nodes.

root://headnode.c.atlasgce.internal:1094//> locateall
No matching files were found.

root://headnode.c.atlasgce.internal:1094//> exit

[benjamin@dpb-apf-00 d3pd_testjob]$ xrd headnode.c.atlasgce.internal
No matching files were found.

Now the files are not found?

A short time later -
[benjamin@dpb-apf-00 d3pd_testjob]$ xrd headnode.c.atlasgce.internal
(C) 2004-2010 by the Xrootd group. Xrootd version: v3.2.7
Welcome to the xrootd command line interface.
Type 'help' for a list of available commands.
root://headnode.c.atlasgce.internal:1094//> dirlist
Error 3011: Unable to open directory
No such file or directory

In server headnode.c.atlasgce.internal:1094 or in some of its child nodes.
-rw-(048)    801094670 2013-01-07 11:35:18
-rw-(048)   3559362431 2013-01-07 11:35:04
-rw-(048)   3674852869 2013-01-07 11:34:20
-rw-(048)   1708802906 2013-01-07 11:33:29
-rw-(048)   3720590161 2013-01-07 11:33:07
-rw-(048)   3597669746 2013-01-07 11:32:16
-rw-(048)    559179199 2013-01-07 11:31:30
-rw-(048)   3755235005 2013-01-07 11:31:20
-rw-(048)   3748590483 2013-01-07 11:30:34
-rw-(048)   3814234452 2013-01-07 11:29:48
-rw-(048)   3865215588 2013-01-07 11:28:08


The files are found.  There were no changes to the system.

Are there timeouts that I can set to make the system a bit more robust
against these transient issues?


Doug Benjamin


Reply to this item at:


  Message sent via/by LCG Savannah

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link: