Print

Print


Hello Jeremy,

I am afraid that things that are fairly easy to do on your own laptop or desktop machine, are not so trivial on one of the clon machine. There are many reasons for this, one of which is the difficulty moving anything from these machines to anywhere else.

Most likely the process takes up too much memory when it maps the file. That is why you can only use the last file of a run, because that file usually is much smaller than the 2GB for all the other files. Carl Timmer has nothing to do with this, it is not a bug in Jevio, and we already solved the sequential read issue that would overcome this problem.
We can now copy the files to /w/stage3/BUFFERED and access them from other machines. I verified this works on file hps_004892.evio.0 using the svt_timing_in_monitoring I was trying to run this morning. So this is one good option.
I think that the SVT Monitoring steering files had code in them that was not checked in, so I could not run these from my own jar file that can do the sequential read. I don’t know how I can get some methods from one Jar and combine these with methods from another Jar. In other words, I don’t know how to “link” Omar’s jar with my Jevio.jar the way I could for compiled code with LD_PRELOAD. Is there a Java equivalent?
From the clondaq5 machine, it is a tall order to get the huge crash dump from the screen into an email message. You cannot pipe it into “mail -c [log in to unmask] <mailto:[log in to unmask]>”, because sendmail is not setup. I would need to somehow get it into a file, transfer the file to the outside, then import it into my laptop and send it to you. It can be done, but when you are under real time pressure to get things done, that is too many steps and too many passwords. I was able to do this now while we are waiting for beam restoration.
Attached files: crash1.txt is when running from clondaq5 and opening an EVIO file.  crash2.txt shows the crash when I run the svt_monitoring from my own hps-distribution-bin.jar. The error messages are not very clear, but I think in the latter case there are resources missing in the jar. I think Omar just checked in all the code required, so I will try again with a “latest”.
I do not have the tools to upload the new jevio jar so that we can have that in our distribution instead of the old one. I don’t understand why you are taking so long doing this, since this is a 100% solved issue, as I told you before. The new jar is a drop in for the old one.
There is no issue on a normal machine, or when running sequential mode, to open any of the evio.x files. IF you are running on clondaq5, then only the smaller files seem to work (I haven’t tried this)
It used to be the opposite, where the last file would not open if the DAQ had crashed, and we have had a lot of DAQ crashes. That issue was successfully fixed already.

Let’s please move forward and start using the latest jevio 4.4.5 instead of the old one. 

Best,
	Maurik

 



> On Apr 20, 2015, at 4:38 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:
> 
> Hi, Maurik.  
> 
> It seems to me that there are the following issues to solve here:
> 
> 1) EVIO memory mapping fails on clondaq5.  
> 
> Do we know why?  Have you or anyone else contacted Carl Timmer about this issue?  He is the appropriate go to person for all Java EVIO issues, and he is quite responsive when issues arise, in my experience.  If there is a per-process memory limit on this machine (maybe from a ulimit setting?), can this be increased so that the ~2.0 GB files are mappable?  Is copying data files to /work or /volatile where they are accessible from less-restricted nodes a viable work around?  Is running memory intensive GUI applications on the "daq" machine even something we should be doing?  (I don't know for sure.  It just strikes me as not a great idea to run a process on part of the DAQ system that potentially takes up 2+ GB of RAM.)
> 
> 2) The SVT monitoring steering config you are trying to use is not working.  
> 
> Having no specific traceback information about this, it is difficult to say anything much.  Do you see a full error traceback printed to the console?  Most log messages that show up as WARNING or SEVERE should have a corresponding traceback message that is printed to the console.  I spoke with Omar who told me that the SVT steering configs are all working for him.  Perhaps you just need to pull the latest from trunk to get this working?
> 
> 3) We should use a new JEVIO version that supports sequential reading.
> 
> I can work on getting this going but first I would like to tag what we have working right now by making an HPS Java release (planned for this evening).  Then I'd like to work from the 4.4.6-SNAPSHOT that Carl Timmer is developing on right now in the new CODA Maven repo (yay!).  Any/all changes for supporting sequential reading of EVIO files should go into the official JEVIO distribution.  It wasn't clear to me if you were working from an official JEVIO release jar downloaded from the CODA website or had hacked up JEVIO yourself, so perhaps you can clarify this for me.  
> 
> 4) Only the last file in a run is readable with the EvioReader.
> 
> This was not mentioned in your email, but Omar reported this to me this morning.  But I have never seen this problem.  Apparently, only the last file in a run can be read into our framework.  Can someone confirm?  How can I reproduce this issue?
> 
> Anything else that is in urgent need of attention right now?
> 
> --Jeremy
> 
>> On Apr 20, 2015, at 4:46 AM, Maurik Holtrop <[log in to unmask]> wrote:
>> 
>> Hello Jeremy,
>> 
>> 
>> We have a bit of a productivity killer in the following dilemma:  
>> 
>> clondaq5 cannot run the monitoring app and read from a file. It dies on the "Map Failed", which I think is opening the EVIO file and memory mapping it. 
>> clondaq5 is the only machine in the counting house that can directly "see" the /data/hps drive where all our data is stored.
>> There is no other drive or space available to use (/volatile or /work) where we can copy a data file to be analyzed on another machine.
>> 
>> The result is that it is nearly impossible to do a quick replay of a data file using the monitoring app.
>> When you are in the counting house and need to quickly get something looked at, this is a real issue.
>> 
>> From a quick look, It *appears* that the sequential read works. However, when trying the SVT timing in monitoring, I then get a new error: "Error setting up LCSim", after "adding coditions listener". 
>> 
>> Result: I have no plots to show of all the really hard work that was done tonight to time in the SVT and get some tracks. This is a real shame!
>> 
>> Best,
>>   Maurik
>> 
>> CONFIG: added EvioDetectorConditionsProcessor to job with detector 
>> HPS-EngRun2015-Nominal-v0 Opening reader for file /data/hps/hps_004870.evio.0 ...
>> Mon Apr 20 07:16:49 EDT 2015 MonitoringApplication log
>> SEVERE: java.io.IOException: Map failed
>> java.lang.RuntimeException: java.io.IOException: Map failed
>>   at org.hps.record.evio.EvioFileSource.openReader(EvioFileSource.java:157)
>>   at org.hps.record.evio.EvioFileSource.<init>(EvioFileSource.java:51)
>>   at org.hps.record.composite.CompositeLoop.setCompositeLoopConfiguration(CompositeLoop.java:261)
>>   at org.hps.record.composite.CompositeLoop.<init>(CompositeLoop.java:94)
>>   at org.hps.monitoring.application.EventProcessing.setupLoop(EventProcessing.java:595)
>>   at org.hps.monitoring.application.EventProcessing.setup(EventProcessing.java:444)
>>   at org.hps.monitoring.application.MonitoringApplication.startSession(MonitoringApplication.java:996)
>>   at org.hps.monitoring.application.MonitoringApplication.actionPerformed(MonitoringApplication.java:335)
>>   at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
>>   at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
>>   at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>   at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>   at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
>>   at java.awt.Component.processMouseEvent(Component.java:6516)
>>   at javax.swing.JComponent.processMouseEvent(JComponent.java:3320)
>>   at java.awt.Component.processEvent(Component.java:6281)
>>   at java.awt.Container.processEvent(Container.java:2229)
>>   at java.awt.Component.dispatchEventImpl(Component.java:4872)
>>   at java.awt.Container.dispatchEventImpl(Container.java:2287)
>>   at java.awt.Component.dispatchEvent(Component.java:4698)
>>   at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
>>   at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
>>   at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
>>   at java.awt.Container.dispatchEventImpl(Container.java:2273)
>>   at java.awt.Window.dispatchEventImpl(Window.java:2719)
>>   at java.awt.Component.dispatchEvent(Component.java:4698)
>>   at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
>>   at java.awt.EventQueue.access$200(EventQueue.java:103)
>>   at java.awt.EventQueue$3.run(EventQueue.java:694)
>> 
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1