Phenix data base requirements


last update: 13 Nov 96 22:00 est EST

Here is the first pass at a 'requirements document'
I try to list all items which have been brought up so far, in no particular
order. [ IN BRACKETS AND CAPITALS ARE ITEMS WE NEED TO DISCUSS ]

  1 - ACCESSIBILITY
  -----------------
  1.1 - on the net:
  Except where explicitely noted, all data base information should be 
accessible to any collaborator that has access to the Internet, regardless of
geographical location [DOES THIS NEED TO BE RESTRICTED?].

  1.2 - security
  --------------
  Most information needs to be safe from tampering. Therefore, all data bases 
need to have adequate mechanisms to prevent unauthorized modification or entry 
of data. 

  1.3 - affordable access
  -----------------------
  A tradeoff between convenience (all-commercial products, no programming for
Phenix) and price (we write everything ourselves) is inevitable. It is 
feasible for Phenix to spend a fairly large sum on data base software at a 
'central site' [SHALL WE SET A LIMIT ?? ], but the ability of many
collaborating institutions to spend money on commercial software at the local
computing facilities is limited. Therefore the implementation of the data base
solutions should cost no more than the amount that one would spend on a 
compiler.

  1.4 - partial access
  --------------------
The database system should support replication of the whole or parts
of a database on a single machine or across a LAN/WAN and also on
standalone machines (laptops) to improve read performance and data
availability. This replication mechanism including coherency
administration issues should be part of the DB system rather than be
the responsibility of the application.

  1.5 - ease of access
  --------------------
  There should be a suite of tools that make it easy to insert data into the 
data bases, and (re-)organize them. There is a lot of information out there 
already in the form of Foxpro data bases, large collections of pictures, 
graphs and ascii files that need to be transferred into a central place. If 
access is too cumbersome, this will not be done and the data will eventually
be lost. 
What is termed 'easy' depends somewhat on the data category. For
documentation, we have a wide range of people with differing abilities that 
need the access. Document access therefore needs to be especially 
user-friendly. 
For other data types, such as engineering drawings, or calibrations, the
audience is much narrower. In any case, ease of access must be such that no
user is tempted to avoid using the data base.

 
  1.6 - underlying access modes
  -----------------------------
  The vast majority of users will use standard interfaces to access the data.
The interfaces themselves should be able to use SQLs and embedded SQLs to do 
their work.
[ANY OTHERS?]


  1.7 - forward compatibility
  ---------------------------
  The future is notoriously hard to predict, but we'd like our data base to be 
forward compatible; they should be still be useful in 1-2 decades.
The best way to ensure that is to adhere to adhere to widely accepted industry
standards and practices.

  1.8 - customizable interfaces
  -----------------------------
  The interfaces should be easily customizable, so that Phenix-specific 
interface features can be implemented. (here we refer to programmers, not
every user). In this context, we also specify that 
there be no impediments to the programming languages accepted by the rest of 
Phenix (F77?, C++), as well as other widely accepted languages (Java, Tk/Tcl)

  1.9 - access speed for retrieval
  --------------------------------
  There are speed requirements on certain sections of the data base. For 
example, the 'online data' need to be available in 'real time' (see 
the ONCS writeup, section 3, and the RICH writeup, near the bottom.
Other parts of the data base, such as photographs and engineering drawings, 
can be stashed away for retrieval on time scales of hours. Calibration data,
when used by an analysis program, needs to be retrievable very quickly, once 
the analysis is running [ I THINK WE NEED TO BE FLEXIBLE HERE; IF I WANT TO DO 
ANALYSIS HERE AT LANL, I MIGHT HAVE TO PREPARE BY DOWNLOADING A CHUNK OF DATA 
INTO A LOCAL CACHE, WHICH MIGHT BE ON THE 1 HOUR TIME SCALE, OR SUFFER THROUGH 
A FIRST ROUND OF SLOW ACCESS ]

  1.10 - access speed for insertion
  --------------------------------
  The issue of access speed for insertion is more complicated. Typically,
insertion times are 10x retrieval times. There can be 
longish times for documents [HOW LONG?], and for most calibrations. However,
some calibrations are produced in 'near real time', and need to be available
as downloadable constants before the next start-of-run. Also, in multi-pass
offline calibration programs, constants derived in pass N need to be available
in pass N+1. This can be solved by saving them internal to the program, or by
providing 'fast enough' data base service [ I THINK WE CAN SAVE MONEY HERE BY
BEING FLEXIBLE, AND PASSING THE BUCK TO THE CALIBRATION PROGRAMMERS - ARE
THERE OTHER DANGERS TO SPECIFYING A ? ]


  2 - DATA TYPES
  --------------
  Refer to the summary table for estimates of
data volumes by type.
Although our data fall into broad classes as described in sections 2.1-2.4,
data come in many forms. In order to preserve forward compatibility, there 
should be no limitation imposed by the data base on the data formats that can 
be stored.


  2.1 - documents
  ---------------
  In this category fall writeups, copies of talks and slides, manuals, memos
and the like.
In general, documents are envisioned to reside 'on the web'.
We need a Phenix-specific search engine to locate info by header, keyword, 
content.
Portions of this (e.g. project management docs) need to be able to beroped
off to all but authorized readers.

  2.2 - drawings
  --------------
  These are CAD models, engineering drawings and the like. Since CAD files can 
only be viewed by experts with the appropriate expensive software, there is
probably no need to have these generally accessible. However, they should be
catalogued.
Postscript files can be derived from CAD models, and we should consider having 
a fairly complete set of these available online.

Engineering drawings come in a variety (?) of formats (PDF, ps..). In 
principle they can be stored for general access, but we might consider only 
storing in the data base a catalog of such drawings. Again, we should consider
having postscript copies available online.

  2.3 - photographs
  -----------------
  Some subsystems are taking (MUT) or plan to take (MVD) copious digital 
photographs during construction. They should be archived and organized in a 
way similar to 'documents', and linked to them.

  2.4 - calibrations
  ------------------
  This is what we traditionally think of exclusively in the data base arena. 
They don't need to be cross-linked to the other categories (docs, photos, 
drawings). They need a good interface to the software, and to some browser 
with graphing capabilities, such that you can quickly display for example the 
behavior of some channel vs. time, run number, hall temperature or other 
parameter in the data base.
Should there be standard keys (run#, time) for all calibrations? 


  2.5 - detector vital signs
  --------------------------
  These are ONCS-type, real-time data. High and low voltages, temperatures, 
pressures, configuration descriptions, FPGA programs, downloadable constants 
(e.g. from calibs, for zero-suppression). Since this is real-time, data rates 
become an issue to be specified. Ancillary data flow in at a rate of 
approximately 2.3 Mb/day. In addition, the detector configuration data
(approx 10Mb) needs to be retrievable at startup in a reasonable (5 min) time,
corresponding to 300 Kb/sec.

  
  3.0 - miscellaneous considerations
  ----------------------------------
  GEANT 4.x might actually impose an OODB on the users. The GEANT folks'
policy is to go with the OODB (but it will definitely be an OODB) which RD45
ensorses. So we might be in this business if we want to or not.

  4.0 - subsystem responses
  -------------------------
  

  4.1 - Beam-beam
  ---------------
  4.1.1 - Parameters measured during construction: 

    - pmt gains (a few words X 160)  ("a few" = "5 - 20")
    - noise (a few words X 160)
    - pulse shape (gif)  (a few Kbyte X 160)

  4.1.2 - Parameters meausred online during runs:
    - pmt gains (a few words X 128)
    - pmt parameters (slewing parameters etc.) (a few words X 128)
    - TDC offset (a few words X 128)
    - TDC gain (a few words X 128)
    - ADC offset (a few words X 128)
    - ADC again (a few words X 128)
    - history (pmt assignments, HV values etc.) (a few Kbyte ??)

  4.1.3 - Parameters obtained offline in calibration/analysis phase:
    - pmt gains (a few words X 128) 
    - pmt parameters (slewing parameters etc.) (a few words X 128) 
    - TDC offset (a few words X 128)
    - TDC gain (a few words X 128)
    - ADC offset (a few words X 128)
    - ADC again (a few words X 128)
    - history (parameter changes during calibration/analysis) (a few Kbyte ??)

  4.2 - MVD
  ---------
  4.2.1 - Construction data:
  We are collecting about 10 Kbytes of data per 256-channel chip. If we have a 
few hundred chips passing through our lab, this would add up to 2Mbytes of 
data.

  4.2.2 - run-time data:
  We have temperature, pressure, flow, voltage measurements coming back from 
slow control, totaling a few dozen channels. It is hard to estimate how much
of this we would want to save. On day 1, you'd perhaps like to save them every 
minute, just in case there is something to trace back. After shakedown, much 
of this can be discarded again, and 1 save/hour may be sufficient. An even 
lower rate could be achieved by only writing parameters when they go out of 
range, which should be practically never.
Even saving 50 words every minute adds up to only a few dozen Mbytes/year.

> - Data produced by calibration programs:
Calibration of discriminator thresholds: scan in the vicinity of the
thresholds by varying the injected charge. 20 charge levels, each strobed
300 times to get about 5% statistical accuracy gives 6K calibration events.
The results would be reduced to 1 value +- 1 sigma per channel.
- Calibration of preamp linearity and gain: 20 charge levels, each 300 times,
gives 6K events. If the preamps are linear, this would reduce to 2 values per
channel; if non-linearities need to be parameterized, perhaps 2-4 values per
channel.
- ADC pedestal and pedestal widths: same as above, reduce to 1 value +- 1
sigma per channel.

Database storage requirements are thus: for 6-8 words per channel, times
2*10**4 channels gives maximally 1.6*10*5 words of calibration per week.

  4.2.3 - documentation:
We currently have most of our documents on the web. Expectations are that 
there are one or a few dozen notes per year.
We keep photo album on the web, expected a few dozen pictures per year

  4.2.4 - engineering drawings:
Engineering drawings are in Unigraphics. These are files which are meaningless 
without the application installed, or without a translator to another 
expensive CAD/CAM application. I suggest these drawings be disregarded in the 
data base requirements process.

  4.2.5 - input access:
  We are generating quality control data on silicon detectors. For now they 
are 
(asci?) files on a Mac generated by Labview. *** how much? ***
We have some data in a Foxpro data base  *** do we need to keep this? ***

  4.2.6 - retrieval access:
  Calibration-like records: there should be a user interface which lets you
search for, sort, list, extract calibrations from the data bases, with some
manipulation and display capabilities too.
The interface should be able to run on what Phenix decides is the 
lowest-common-denominator hardware (vt100? x-terminal?...)
 
> - direct access from analysis programs?
yes for calibrations, and geometry data
The web seems like a good place for this, with the following additions:
* I would like to search through all PHENIX documents (even though they may be 
  scattered over many machines). This should be a search over Phenix pages
  only, not the whole web.
* I'd like to search by author, title, keyword, content ...


  4.3 - ONCS
  ----------
  4.3.1 - construction data:
  4.3.2 - run-time data:
  4.3.3 - documentation:
  4.3.4 - input access:
  4.3.5 - retrieval access:

  4.4 - RICH
  ----------
  4.4.1 - construction data:
We have 5120 PMTs in RICH. We have
   A) data sheets from the manufacture (Hamamatsu)
   B) test data at INS/Tokyo before they are shipped to US
   C) test data at SUNY-SB in "supermodule"
for each of the tubes. We are setting up ORACLE database for those data.
S. Salomone at SUNY-SB is working on it. They will be organized in separate
tables (relations) with a common key (serial #). Each table has about 10
columns. Table (A) and (B) has one entires for each tube, and (C) will have
at least two entries before installtions.

We are planning to store "histograms" of the measurement at SB in database.
We are currently studying how to do this.

We will also has a table of (location <--> serial #), where location is
the location of the tube in PHENIX detector.

We will also has tables of electronics. (FEE modules, HV modules, DCM)
The plan for them is not established yet.

  4.4.2 - run-time data:
We plans to monintor
 - temperature, cherenkov radiator gas purity, flow rate etc from vessel
 - HV for tubes. (one HV channel supply 8 to 16 tubes)
Those data will be sampled regulary in the run, and stored in database. The
sampling frequencey will be at least once per 8 hour shift, but should be 
more frequrent initially.

We also records the calibration data. I imagine at least two kinds of data 
here
(1) calibration data (pedestal, gain, noise level, etc) determined from the
data
(2) calibration data that is loaded in FEE and DCM during the run. (We have
variable-gain amplifier in FEE. The setting of VGA should also be recorded)

Component history should also be recorded. This will be naturally handled
(location <---> serial #) table and history entry for each tube. For
FEE and DCM cards, we may have "repair history" entry. (A tube is not repaired
if it fails, so there will be no repair history) 

The largest demand for database will come from the calibration data. Since
we have 5120 tubes, and each tube produces at least 8 words (ADC ped, ADC 
gain,
ADC noise, one-photon resolution, TDC clock, TDC slewing, TDC t0, HV setting)
per calibration, we will have > 40K words = 160K bytes per calibration.
This means we need at least 1 Mb per day for database storage just for
calibration data.

  4.4.3 - documentation:
  > - Writeups, technical drawings, minutes, photos ...
> - other?

It is nice to have those documents in database, too. Is anyone studies how
to do this? If someone in PHENIX setup for a good database for documents,
we will use it.

  4.4.4 - input access:
  There should be many ways to put the data into database. The phototube 
datasheets from Hamamatsu is supplied as ASCII text, and S. Salomone has
already put them into ORACLE. (I think she used SQL*LOAD.) The test results
from INS and SUNY-SB will be put in ORACLE in a similar way.

I think the direct write access to database from application is a policy 
issue.
We need a method to put calibration data determiend by "calibration modules"
of analysis program. However, this does not necessarily mean that the
calibration modules directly write the results to the DB. This is a
convenient way, but it may introduces a possibility that 'wrong' results can
put in the DB. We need to make the polcy decision here. I think the issues
to be considered are:
- how to 'certify' the calibration results so that that are put in DB
- is all 'calibration data' will be kept in DB? (Wrong data is removed?)
- how to control "write permission" to DB
- I think we need some "standard keys" for calibrations. What is it?
  run number, or time stamp or else or combined?
If we decide that the calibration modules directly writes to the DB, we
need a "standard database API". The question is

- How the API is defined?
- Who implement it?
- Who implement it?

coupled questions are the design of the calibration DB. Namely,
- Who define the tables (relations) for the DB?
- Who design/implement/maintain the DB access API.
- How the design of the DB affect the performace (i.e. speed of query,
  storage space) of DB.
- How to modify the table definition if it is needed? When a table is
  modified, how the application program that access to it is modified? How
  to keep the compatibility for old data and new data.

  4.4.5 - retrieval access:
  


  4.5 - System Engeneering and Integration, and Project Management
  ----------------------------------------------------------------
  4.5.1 - construction data:
        SE&I
        - Catalog of mechanical drawing numbers, titles, and authors
        - Catalog of baselined drawings and documentation
        - PHENIX Nomenclature
        - Channel characteristics
        - Cable and plumbing plant data
        - Catalog of electronic systems documentation
        - Survey data
        - QA documentation (or a catalog of such)

        PM
        - Index of PHENIX Notes 
        - List of institutions and participants with pertinent 
          information (Phone Book)

Construction and QA data 
  QA documentation: a few (5)MB  
  Drawing and parts catalog: 2 MB
  Documentation of construction and cable plant:        depending on 
    how it is done - 20 to 100MB
  Electronic documentation: several GB in ViewLogic 
  Integration model files: 500MB   

                        - total . . . . . . . . . . several GB 
                        - total in a structured DB: 27MB
  4.5.2 - run-time data:
Calibration and Alignment
   Alignment data for all systems in the MFH hall from BNL survey: 0.5MB/year

                        - total in a structured DB: 0.5MB/year  4.5.3 - documentation:
  4.5.3 - documentation
Documentation: 
  Word and Excel and HTML files: A few (10 - 20)MB 

                        - total . . . . . . . . . . 20MB 
                        - total in a structured DB: 0
PM

Documentation: 
  Word, Excel, and HTML files A few (10)MB (I doubt if much of this stuff
  will go into a structures database system.) 
         
  Phenix notes: 
    These are mostly on paper but the index and future notes may be
    in electronic form and could go into a DB.

  4.5.4 - input access:
  4.5.5 - retrieval access:

  4.6 - TOF
  ---------
  4.6.1 - construction data:
PMT's
=====
1) Spec of the PMT (manufacturer part #, # of stages, maximum HV rating, 
   typical gain,  transit time spread)
2) physical size of the PMT
3) 1920 PMT's will be used for TOF. For each PMT's we need to record
   A) data sheets (especially blue sensitivity from Hamamatsu)
   B) initial gain factor measured on the bench
4) We will also has a table of (location <--> serial #), where location is
   the location of the tube in PHENIX detector.
5) Signal cable length from each PMT's(1920)

Scintillators
=============
1) spec of BC404 (rise time, fall time, amount of light with MIP etc...)
2) physical size of two types of scintillator(short and long)
3) physical size of light guides
4) typical attenuation length 

Electronics
===========
1) sepc of FEE
2) spec of HV supply
3) spec of DCM
4) ch vs. time constant + offset calibration constants for each FEE ch.
5) location (PMT's)<--> module id table should be stored
6) Physical location of the FEE crates
7) signal propagation velocity temperature dependence

Geometry
========
1) physical size of a panel
2) location of each slats on a panel
3) average material thickness in radiation length 
4) surveyed position of each panels

comments
========
Any major modifications of the system with dates

  4.6.2 - run-time data:
Slow controls
=============
1) temperature and voltages every 2-4 hours (may be related to t0 drift)
2) low voltage and current + fluctuation width every 2-4 hours
   to record the fluctuation, voltage and current have to be monitored 
   continuously but written to database once in 2-4 hours.
3) sometime, ARCNET chips need initialization.  It would be useful if it 
   is monitored and status is store in the database at the frequency of 
   system initialization.
4) system clock frequency in front of the FEE crate

Calibration program output
==========================
1) time serolution of reach slats(960) from Laser calibration once per day
2) relative timing difference slat by slat(960) every 2-4 hours
3) t0 after slewing correction(960) every 2-4 hours
4) PMT gains (1920) every week
5) slewing parameters per PMT's(1920) every 1 - 2 weeks
6) light propagation time through scintillator + position offset per 
   slat(960) every 2-4 weeks.

Component histories etc...
==========================
1) dead/bad channel status by PMT, laser calibration fiber, HV module, FEE 
module, FEE crate, DCM
2) FEE and HV module replacement history
3) ARCNET initialization packet send and read back (~200  word * 120 FEE 
   module)

  4.6.3 - documentation:
We are working on writing up all documents.  We do not know what is the 
best way to post it.  On the WEB?

  4.6.4 - input access:
1) Most important thing is that we can get information very quickly(less than
1 s/calibration parameter set)
2) It would be nice to have dead channels appear in the event display 
   with distinguishable color.  I mean those information should be 
   automatically available from event display.
3) fancy browser in which you can define columns and scope with searchability



        In AGS-E802, we were managing only one official database file
and that was the only place we could try out databse programs.  It
turned out people try to input values into it on try and error bases.
Which caused lots of garbage accumulated in the official data base and
slowed down the access unnecessarily.

        It would be nice to have a database test area where people try to
input their data to database to debug their database retrieval
programs and their database values.

        Or it might be advantageous to have one person take care of 
the database retrieval programs and tailer them to each subsystem.  

        Main database should be protected from unnecessary garbage which 
tends to accumulate if there's no flow control.

  4.6.5 - retrieval access:
        It would be nice to have a program which shows the difference 
of the calibration parameters set by set graphically so that we can 
spot unusual behavior of the parameters.  Mostlikely it is caused by 
fit failure.


  4.7 - Central Tracking
  ----------------------
  4.7.1 - construction data:

In Central Tracking we have:
1) 48 TEC chambers
2) 40 DC keystones
3) 32 PC chambers
The kind of data we will have are
A) Wire Tension measurements
B) Wire Position measurements
C) Wire Gain measurements
D) Electronics Gain Measurements
E) Electronics Noise Measurements
F) Electronics Pedestal Measurements

This information will amount to about 30 Mb of data. 

  4.7.2 - run-time data:

We plan to monitor probably once per hour but we haven't yet studied,
it is possible some quantity need to be updated more frequently (every
10-15 minutes) and others only once per 8 hour:
 - temperature, pressure, gas flow
 - HV, LV, Serial information downloaded into preamp or thresholds etc. 
 - maybe alignement

Also we will have to have a time history of the correspondance 
wire number- preamp channel #- FEM # - DCM # -power supply #-high
voltage supply#-timing module #.

We will also store the calibration data in the database and they will
be the largest amount of data in the database. Assuming we need a
calibration file every 30 minutes (TEC performance for example depends on 
accurate gain equalization) we might have to store 30 Gbytes of
data in the database per year.

  4.7.3 - documentation:

I think we would like to keep most of the documents on the WWW, but
it would be nice to have a database able to do a search using our
defined keywords and returning URL addresses.
Even Technical drawing could be translated into postscript or PDF and
use a similar system.  If we only store pointers it would be
very little data (1Mb?)


  4.7.4 - input access:
We haven't quite made a decision on which database to use during
construction most likely it will be Fox Pro.

  4.7.5 - retrieval access:
I think any database is a good tool only if it has an easy enough
interface that most people will be able to use it. 

It would be nice if we could access the database information through the WEB
especially the documentation part. 

The database used for online/offline calibration and serial information
should have some performance requirements so it will never be the
bottleneck while trying to analyze or even worse take data.
Also we definitely want the data to be accessible from analysis
programs.

It is not clear to me, we should have a single database to store
both the calibration and documentat information.