LISTSERV mailing list manager LISTSERV 16.5

Help for HPS-SOFTWARE Archives


HPS-SOFTWARE Archives

HPS-SOFTWARE Archives


HPS-SOFTWARE@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

HPS-SOFTWARE Home

HPS-SOFTWARE Home

HPS-SOFTWARE  March 2016

HPS-SOFTWARE March 2016

Subject:

Re: Keeping up with DQM

From:

Nathan Baltzell <[log in to unmask]>

Reply-To:

Software for the Heavy Photon Search Experiment <[log in to unmask]>

Date:

Tue, 8 Mar 2016 19:56:53 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (154 lines)

From what Maurik said earlier in this email, he already had
scicomp today start transferring them directly to cache first
while waiting on them to make it to tape.  Is this not working,
or did I misunderstand?



On Mar 8, 2016, at 7:47 PM, Sebouh Paul <[log in to unmask]> wrote:

> Stepan has suggested something that might make things work smoother.  Instead of just copying files from the counting house to tape, we should simultaneously also copy them to a location in /volatile/, so that I can run the DQM jobs inputting from the volatile location.  After the jobs are done, I can delete the files from the volatile, or I can set the script up so that the file is deleted when the job is over.   Anyone else think this is a good idea?  
> 
> On Tue, Mar 8, 2016 at 4:11 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:
> Hi,
> 
> Rerouted to software list:
> 
> I don't believe swif has a way to directly monitor directories for new files, but we can do this in python.  I'm working with Sebouh on the procedures for this right now.  We already have several useful scripts; they just need to be cleaned up a bit and added to a cron job.  We can also use the datacat to tell if DQM files exist already (depends on how fancy we want to get).
> 
> As far as DQM, I suggested Sebouh process runs 7796 - 7800 to start, because these are marked as very good runs in the spreadsheet (good beam).  If there are additional runs that we want to look at specifically right now, please communicate with him about it.
> 
> --Jeremy
> 
> -----Original Message-----
> From: Maurik Holtrop [mailto:[log in to unmask]]
> Sent: Tuesday, March 08, 2016 11:54 AM
> To: Nathan Baltzell
> Cc: Sebouh Paul; McCormick, Jeremy I.; Bradley Yale; Graham, Mathew Thomas; Uemura, Sho; Holly Vance
> Subject: Re: Keeping up with DQM
> 
> Hello Nathan, Sebouh,
> 
> We are now also seeing the files appear at /cache/mss/hallb/hps/data . There are 287 files there right now.
> I am not sure if the "swift" system is smart enough to recognize the files are there, and thus not also try to queue them from tape. Not having to grab them from tape will be a big plus.
> Sebouh should be able to change his glob to point to the /cache/mss rather than /mss and get things working directly from the cache.
> 
> I am noticing that although many of last weekends runs are not on tape, and have been there for a while, there is still no new DQM output. Sebouh, perhaps you can keep your eye on this, and start jobs from the /cache on the files that are appearing there. It will be good to see at least some of the output before too long.
> 
> Best,
>         Maurik
> 
> 
> > On Mar 8, 2016, at 10:56 AM, Nathan Baltzell <[log in to unmask]> wrote:
> >
> > We should see a huge increase in speed of copying data to tape, now
> > that Sergey mounted scicomp's lustre drive on clondaq5.
> >
> > -Nathan
> >
> >
> >
> > On Mar 7, 2016, at 22:35, Maurik Holtrop <[log in to unmask]> wrote:
> >
> >> Hello Sebouh,
> >>
> >> I do wonder if your jobs are stuck because you submitted the job *before* the file actually existed on the tape silo. I see ReconDataQuality_7781 in the job queue, but no 7781 file on the tapes. You can still find those files on clondaq5 in /data/totape, so presumably they haven't been copied yet. This is perhaps strangely slow?
> >>
> >> You may want to double check if jobs that were started before the file existed on tape actually run when that file becomes available on the tape silo, or if these jobs are going to be stuck in perpetuity. You should be able to check.
> >>
> >> I can see on the silo:
> >>
> >> hps@ifarm1102> ls -l /mss/hallb/hps/data/hps_007799.evio.262
> >> -r--r--r-- 1 halldata nobody 441 Mar  7 08:00
> >> /mss/hallb/hps/data/hps_007799.evio.262
> >>
> >> and your job asking for that file (job id 21152089) is still pending....
> >>
> >> If the files are directly put on the /cache drives, in principle this would save a tape operation. I heard that Hall-A is doing this with their data as well.
> >> Making use of these files should not be a lot of changes to your scripts I would think. Just remember that it would be nice to mark the files for deletion when you are done with them.
> >>
> >> Best,
> >>      Maurik
> >>
> >>
> >>
> >>
> >>
> >>> On Mar 7, 2016, at 9:55 PM, Sebouh Paul <[log in to unmask]> wrote:
> >>>
> >>> Jeremy, what do you think of Maurik's suggestion?  It's already monday night, and all of the DQM jobs from this weekend have been stuck in dependency-limbo, waiting for the files from tape.
> >>>
> >>> On Mon, Mar 7, 2016 at 4:48 PM, Maurik Holtrop <[log in to unmask]> wrote:
> >>> Hello Bradley,
> >>>
> >>> No need to, I think.
> >>>
> >>> If you look at the DQM jobs, you will see that they are all pending on a dependency = the files are not available.
> >>>
> >>> As far as I can see, it is not the job slots but the files, but I would appreciate it if someone else could check that I am coming to the correct conclusion.
> >>>
> >>> I suggest that we move to a slightly different way of processing the data:
> >>>
> >>> * Files that are copied from the counting house to JLab are immediately put on the /cache disk as soon as they are written to tape.
> >>> * Sebouh, or a clever script acting on his behalve, monitors /cache/hallb/hps/data for new files and starts batch jobs to process them immediately.
> >>> * As soon as a file is processed from /cache the file is marked for deletion. This is needed so that we don't fill our cache quota immediately with raw files and not leaving any space for other use.
> >>>
> >>> Step one has to be arrange with the computer center. I took the jump and already asked Chris to set this up.
> >>>
> >>> Best,
> >>>     Maurik
> >>>
> >>>
> >>>
> >>>
> >>>> On Mar 7, 2016, at 4:11 PM, Bradley T Yale <[log in to unmask]> wrote:
> >>>>
> >>>> Sorry, I'm killing the pending aprime jobs so that yours can start.
> >>>> These were mainly for increasing the statistics for Omar's analysis, but not as high of a priority I think.
> >>>> The farm was also very crowded over the weekend, which did not help things.
> >>>>
> >>>> From: Sebouh Paul <[log in to unmask]>
> >>>> Sent: Monday, March 7, 2016 12:10 PM
> >>>> To: Maurik Holtrop
> >>>> Cc: Bradley T Yale; Nathan Baltzell; Mathew Thomas Graham; Sho
> >>>> Uemura; Holly Vance
> >>>> Subject: Re: Keeping up with DQM
> >>>>
> >>>> If you have any suggestions as to how to increase priority for the DQM jobs (or decrease it for the other hps jobs that can wait, such as monte carlo) let me know.
> >>>> On Mar 7, 2016 12:06 PM, "Sebouh Paul" <[log in to unmask]> wrote:
> >>>> I have submitted jobs to the farm for the runs in which all or at
> >>>> least most of the files have been transferred to tape, but none of them have started running yet.  My guess is the farm is giving higher priority to the slic_aprimes jobs than to my dqm jobs,since those are sometimes running but none of my dqm jobs have started running yet On Mar 7, 2016 11:53 AM, "Maurik Holtrop" <[log in to unmask]> wrote:
> >>>> Hello Sebouh,
> >>>>
> >>>> How well are you able to keep up with DQM output as data comes out of the counting house?
> >>>>
> >>>> If we had continuous running, the goal would be to have DQM report within 24 hours of the data being taken. I.e. there would be a summary of DQM at each run meeting on the quality of the data taken the previous day. At this point, I am not yet seeing the DQM output from last Friday-Saturday runs, several of which were 100M+ events, in /lustre/expphy/work/hallb/hps/data/physrun2016/pass0/dqm. Can you please let me know what the throughput is of DQM?
> >>>>
> >>>> Not having this output in a timely manner also hinders the experts that should be looking at this output.
> >>>>
> >>>> Best,
> >>>>        Maurik
> >>>
> >>>
> >>
> >
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
> 
> 
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
> 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
June 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use