last update: 13 Nov 96 22:00 est EST Here is the first pass at a 'requirements document' I try to list all items which have been brought up so far, in no particular order. [ IN BRACKETS AND CAPITALS ARE ITEMS WE NEED TO DISCUSS ] 1 - ACCESSIBILITY ----------------- 1.1 - on the net: Except where explicitely noted, all data base information should be accessible to any collaborator that has access to the Internet, regardless of geographical location [DOES THIS NEED TO BE RESTRICTED?]. 1.2 - security -------------- Most information needs to be safe from tampering. Therefore, all data bases need to have adequate mechanisms to prevent unauthorized modification or entry of data. 1.3 - affordable access ----------------------- A tradeoff between convenience (all-commercial products, no programming for Phenix) and price (we write everything ourselves) is inevitable. It is feasible for Phenix to spend a fairly large sum on data base software at a 'central site' [SHALL WE SET A LIMIT ?? ], but the ability of many collaborating institutions to spend money on commercial software at the local computing facilities is limited. Therefore the implementation of the data base solutions should cost no more than the amount that one would spend on a compiler. 1.4 - partial access -------------------- The database system should support replication of the whole or parts of a database on a single machine or across a LAN/WAN and also on standalone machines (laptops) to improve read performance and data availability. This replication mechanism including coherency administration issues should be part of the DB system rather than be the responsibility of the application. 1.5 - ease of access -------------------- There should be a suite of tools that make it easy to insert data into the data bases, and (re-)organize them. There is a lot of information out there already in the form of Foxpro data bases, large collections of pictures, graphs and ascii files that need to be transferred into a central place. If access is too cumbersome, this will not be done and the data will eventually be lost. What is termed 'easy' depends somewhat on the data category. For documentation, we have a wide range of people with differing abilities that need the access. Document access therefore needs to be especially user-friendly. For other data types, such as engineering drawings, or calibrations, the audience is much narrower. In any case, ease of access must be such that no user is tempted to avoid using the data base. 1.6 - underlying access modes ----------------------------- The vast majority of users will use standard interfaces to access the data. The interfaces themselves should be able to use SQLs and embedded SQLs to do their work. [ANY OTHERS?] 1.7 - forward compatibility --------------------------- The future is notoriously hard to predict, but we'd like our data base to be forward compatible; they should be still be useful in 1-2 decades. The best way to ensure that is to adhere to adhere to widely accepted industry standards and practices. 1.8 - customizable interfaces ----------------------------- The interfaces should be easily customizable, so that Phenix-specific interface features can be implemented. (here we refer to programmers, not every user). In this context, we also specify that there be no impediments to the programming languages accepted by the rest of Phenix (F77?, C++), as well as other widely accepted languages (Java, Tk/Tcl) 1.9 - access speed for retrieval -------------------------------- There are speed requirements on certain sections of the data base. For example, the 'online data' need to be available in 'real time' (see the ONCS writeup, section 3, and the RICH writeup, near the bottom. Other parts of the data base, such as photographs and engineering drawings, can be stashed away for retrieval on time scales of hours. Calibration data, when used by an analysis program, needs to be retrievable very quickly, once the analysis is running [ I THINK WE NEED TO BE FLEXIBLE HERE; IF I WANT TO DO ANALYSIS HERE AT LANL, I MIGHT HAVE TO PREPARE BY DOWNLOADING A CHUNK OF DATA INTO A LOCAL CACHE, WHICH MIGHT BE ON THE 1 HOUR TIME SCALE, OR SUFFER THROUGH A FIRST ROUND OF SLOW ACCESS ] 1.10 - access speed for insertion -------------------------------- The issue of access speed for insertion is more complicated. Typically, insertion times are 10x retrieval times. There can be longish times for documents [HOW LONG?], and for most calibrations. However, some calibrations are produced in 'near real time', and need to be available as downloadable constants before the next start-of-run. Also, in multi-pass offline calibration programs, constants derived in pass N need to be available in pass N+1. This can be solved by saving them internal to the program, or by providing 'fast enough' data base service [ I THINK WE CAN SAVE MONEY HERE BY BEING FLEXIBLE, AND PASSING THE BUCK TO THE CALIBRATION PROGRAMMERS - ARE THERE OTHER DANGERS TO SPECIFYING A? ] 2 - DATA TYPES -------------- Refer to the summary table for estimates of data volumes by type. Although our data fall into broad classes as described in sections 2.1-2.4, data come in many forms. In order to preserve forward compatibility, there should be no limitation imposed by the data base on the data formats that can be stored. 2.1 - documents --------------- In this category fall writeups, copies of talks and slides, manuals, memos and the like. In general, documents are envisioned to reside 'on the web'. We need a Phenix-specific search engine to locate info by header, keyword, content. Portions of this (e.g. project management docs) need to be able to beroped off to all but authorized readers. 2.2 - drawings -------------- These are CAD models, engineering drawings and the like. Since CAD files can only be viewed by experts with the appropriate expensive software, there is probably no need to have these generally accessible. However, they should be catalogued. Postscript files can be derived from CAD models, and we should consider having a fairly complete set of these available online. Engineering drawings come in a variety (?) of formats (PDF, ps..). In principle they can be stored for general access, but we might consider only storing in the data base a catalog of such drawings. Again, we should consider having postscript copies available online. 2.3 - photographs ----------------- Some subsystems are taking (MUT) or plan to take (MVD) copious digital photographs during construction. They should be archived and organized in a way similar to 'documents', and linked to them. 2.4 - calibrations ------------------ This is what we traditionally think of exclusively in the data base arena. They don't need to be cross-linked to the other categories (docs, photos, drawings). They need a good interface to the software, and to some browser with graphing capabilities, such that you can quickly display for example the behavior of some channel vs. time, run number, hall temperature or other parameter in the data base. Should there be standard keys (run#, time) for all calibrations? 2.5 - detector vital signs -------------------------- These are ONCS-type, real-time data. High and low voltages, temperatures, pressures, configuration descriptions, FPGA programs, downloadable constants (e.g. from calibs, for zero-suppression). Since this is real-time, data rates become an issue to be specified. Ancillary data flow in at a rate of approximately 2.3 Mb/day. In addition, the detector configuration data (approx 10Mb) needs to be retrievable at startup in a reasonable (5 min) time, corresponding to 300 Kb/sec. 3.0 - miscellaneous considerations ---------------------------------- GEANT 4.x might actually impose an OODB on the users. The GEANT folks' policy is to go with the OODB (but it will definitely be an OODB) which RD45 ensorses. So we might be in this business if we want to or not. 4.0 - subsystem responses ------------------------- 4.1 - Beam-beam --------------- 4.1.1 - Parameters measured during construction: - pmt gains (a few words X 160) ("a few" = "5 - 20") - noise (a few words X 160) - pulse shape (gif) (a few Kbyte X 160) 4.1.2 - Parameters meausred online during runs: - pmt gains (a few words X 128) - pmt parameters (slewing parameters etc.) (a few words X 128) - TDC offset (a few words X 128) - TDC gain (a few words X 128) - ADC offset (a few words X 128) - ADC again (a few words X 128) - history (pmt assignments, HV values etc.) (a few Kbyte ??) 4.1.3 - Parameters obtained offline in calibration/analysis phase: - pmt gains (a few words X 128) - pmt parameters (slewing parameters etc.) (a few words X 128) - TDC offset (a few words X 128) - TDC gain (a few words X 128) - ADC offset (a few words X 128) - ADC again (a few words X 128) - history (parameter changes during calibration/analysis) (a few Kbyte ??) 4.2 - MVD --------- 4.2.1 - Construction data: We are collecting about 10 Kbytes of data per 256-channel chip. If we have a few hundred chips passing through our lab, this would add up to 2Mbytes of data. 4.2.2 - run-time data: We have temperature, pressure, flow, voltage measurements coming back from slow control, totaling a few dozen channels. It is hard to estimate how much of this we would want to save. On day 1, you'd perhaps like to save them every minute, just in case there is something to trace back. After shakedown, much of this can be discarded again, and 1 save/hour may be sufficient. An even lower rate could be achieved by only writing parameters when they go out of range, which should be practically never. Even saving 50 words every minute adds up to only a few dozen Mbytes/year. > - Data produced by calibration programs: Calibration of discriminator thresholds: scan in the vicinity of the thresholds by varying the injected charge. 20 charge levels, each strobed 300 times to get about 5% statistical accuracy gives 6K calibration events. The results would be reduced to 1 value +- 1 sigma per channel. - Calibration of preamp linearity and gain: 20 charge levels, each 300 times, gives 6K events. If the preamps are linear, this would reduce to 2 values per channel; if non-linearities need to be parameterized, perhaps 2-4 values per channel. - ADC pedestal and pedestal widths: same as above, reduce to 1 value +- 1 sigma per channel. Database storage requirements are thus: for 6-8 words per channel, times 2*10**4 channels gives maximally 1.6*10*5 words of calibration per week. 4.2.3 - documentation: We currently have most of our documents on the web. Expectations are that there are one or a few dozen notes per year. We keep photo album on the web, expected a few dozen pictures per year 4.2.4 - engineering drawings: Engineering drawings are in Unigraphics. These are files which are meaningless without the application installed, or without a translator to another expensive CAD/CAM application. I suggest these drawings be disregarded in the data base requirements process. 4.2.5 - input access: We are generating quality control data on silicon detectors. For now they are (asci?) files on a Mac generated by Labview. *** how much? *** We have some data in a Foxpro data base *** do we need to keep this? *** 4.2.6 - retrieval access: Calibration-like records: there should be a user interface which lets you search for, sort, list, extract calibrations from the data bases, with some manipulation and display capabilities too. The interface should be able to run on what Phenix decides is the lowest-common-denominator hardware (vt100? x-terminal?...) > - direct access from analysis programs? yes for calibrations, and geometry data The web seems like a good place for this, with the following additions: * I would like to search through all PHENIX documents (even though they may be scattered over many machines). This should be a search over Phenix pages only, not the whole web. * I'd like to search by author, title, keyword, content ... 4.3 - ONCS ---------- 4.3.1 - construction data: 4.3.2 - run-time data: 4.3.3 - documentation: 4.3.4 - input access: 4.3.5 - retrieval access: 4.4 - RICH ---------- 4.4.1 - construction data: We have 5120 PMTs in RICH. We have A) data sheets from the manufacture (Hamamatsu) B) test data at INS/Tokyo before they are shipped to US C) test data at SUNY-SB in "supermodule" for each of the tubes. We are setting up ORACLE database for those data. S. Salomone at SUNY-SB is working on it. They will be organized in separate tables (relations) with a common key (serial #). Each table has about 10 columns. Table (A) and (B) has one entires for each tube, and (C) will have at least two entries before installtions. We are planning to store "histograms" of the measurement at SB in database. We are currently studying how to do this. We will also has a table of (location <--> serial #), where location is the location of the tube in PHENIX detector. We will also has tables of electronics. (FEE modules, HV modules, DCM) The plan for them is not established yet. 4.4.2 - run-time data: We plans to monintor - temperature, cherenkov radiator gas purity, flow rate etc from vessel - HV for tubes. (one HV channel supply 8 to 16 tubes) Those data will be sampled regulary in the run, and stored in database. The sampling frequencey will be at least once per 8 hour shift, but should be more frequrent initially. We also records the calibration data. I imagine at least two kinds of data here (1) calibration data (pedestal, gain, noise level, etc) determined from the data (2) calibration data that is loaded in FEE and DCM during the run. (We have variable-gain amplifier in FEE. The setting of VGA should also be recorded) Component history should also be recorded. This will be naturally handled (location <---> serial #) table and history entry for each tube. For FEE and DCM cards, we may have "repair history" entry. (A tube is not repaired if it fails, so there will be no repair history) The largest demand for database will come from the calibration data. Since we have 5120 tubes, and each tube produces at least 8 words (ADC ped, ADC gain, ADC noise, one-photon resolution, TDC clock, TDC slewing, TDC t0, HV setting) per calibration, we will have > 40K words = 160K bytes per calibration. This means we need at least 1 Mb per day for database storage just for calibration data. 4.4.3 - documentation: > - Writeups, technical drawings, minutes, photos ... > - other? It is nice to have those documents in database, too. Is anyone studies how to do this? If someone in PHENIX setup for a good database for documents, we will use it. 4.4.4 - input access: There should be many ways to put the data into database. The phototube datasheets from Hamamatsu is supplied as ASCII text, and S. Salomone has already put them into ORACLE. (I think she used SQL*LOAD.) The test results from INS and SUNY-SB will be put in ORACLE in a similar way. I think the direct write access to database from application is a policy issue. We need a method to put calibration data determiend by "calibration modules" of analysis program. However, this does not necessarily mean that the calibration modules directly write the results to the DB. This is a convenient way, but it may introduces a possibility that 'wrong' results can put in the DB. We need to make the polcy decision here. I think the issues to be considered are: - how to 'certify' the calibration results so that that are put in DB - is all 'calibration data' will be kept in DB? (Wrong data is removed?) - how to control "write permission" to DB - I think we need some "standard keys" for calibrations. What is it? run number, or time stamp or else or combined? If we decide that the calibration modules directly writes to the DB, we need a "standard database API". The question is - How the API is defined? - Who implement it? - Who implement it? coupled questions are the design of the calibration DB. Namely, - Who define the tables (relations) for the DB? - Who design/implement/maintain the DB access API. - How the design of the DB affect the performace (i.e. speed of query, storage space) of DB. - How to modify the table definition if it is needed? When a table is modified, how the application program that access to it is modified? How to keep the compatibility for old data and new data. 4.4.5 - retrieval access: 4.5 - System Engeneering and Integration, and Project Management ---------------------------------------------------------------- 4.5.1 - construction data: SE&I - Catalog of mechanical drawing numbers, titles, and authors - Catalog of baselined drawings and documentation - PHENIX Nomenclature - Channel characteristics - Cable and plumbing plant data - Catalog of electronic systems documentation - Survey data - QA documentation (or a catalog of such) PM - Index of PHENIX Notes - List of institutions and participants with pertinent information (Phone Book) Construction and QA data QA documentation: a few (5)MB Drawing and parts catalog: 2 MB Documentation of construction and cable plant: depending on how it is done - 20 to 100MB Electronic documentation: several GB in ViewLogic Integration model files: 500MB - total . . . . . . . . . . several GB - total in a structured DB: 27MB 4.5.2 - run-time data: Calibration and Alignment Alignment data for all systems in the MFH hall from BNL survey: 0.5MB/year - total in a structured DB: 0.5MB/year 4.5.3 - documentation: 4.5.3 - documentation Documentation: Word and Excel and HTML files: A few (10 - 20)MB - total . . . . . . . . . . 20MB - total in a structured DB: 0 PM Documentation: Word, Excel, and HTML files A few (10)MB (I doubt if much of this stuff will go into a structures database system.) Phenix notes: These are mostly on paper but the index and future notes may be in electronic form and could go into a DB. 4.5.4 - input access: 4.5.5 - retrieval access: 4.6 - TOF --------- 4.6.1 - construction data: PMT's ===== 1) Spec of the PMT (manufacturer part #, # of stages, maximum HV rating, typical gain, transit time spread) 2) physical size of the PMT 3) 1920 PMT's will be used for TOF. For each PMT's we need to record A) data sheets (especially blue sensitivity from Hamamatsu) B) initial gain factor measured on the bench 4) We will also has a table of (location <--> serial #), where location is the location of the tube in PHENIX detector. 5) Signal cable length from each PMT's(1920) Scintillators ============= 1) spec of BC404 (rise time, fall time, amount of light with MIP etc...) 2) physical size of two types of scintillator(short and long) 3) physical size of light guides 4) typical attenuation length Electronics =========== 1) sepc of FEE 2) spec of HV supply 3) spec of DCM 4) ch vs. time constant + offset calibration constants for each FEE ch. 5) location (PMT's)<--> module id table should be stored 6) Physical location of the FEE crates 7) signal propagation velocity temperature dependence Geometry ======== 1) physical size of a panel 2) location of each slats on a panel 3) average material thickness in radiation length 4) surveyed position of each panels comments ======== Any major modifications of the system with dates 4.6.2 - run-time data: Slow controls ============= 1) temperature and voltages every 2-4 hours (may be related to t0 drift) 2) low voltage and current + fluctuation width every 2-4 hours to record the fluctuation, voltage and current have to be monitored continuously but written to database once in 2-4 hours. 3) sometime, ARCNET chips need initialization. It would be useful if it is monitored and status is store in the database at the frequency of system initialization. 4) system clock frequency in front of the FEE crate Calibration program output ========================== 1) time serolution of reach slats(960) from Laser calibration once per day 2) relative timing difference slat by slat(960) every 2-4 hours 3) t0 after slewing correction(960) every 2-4 hours 4) PMT gains (1920) every week 5) slewing parameters per PMT's(1920) every 1 - 2 weeks 6) light propagation time through scintillator + position offset per slat(960) every 2-4 weeks. Component histories etc... ========================== 1) dead/bad channel status by PMT, laser calibration fiber, HV module, FEE module, FEE crate, DCM 2) FEE and HV module replacement history 3) ARCNET initialization packet send and read back (~200 word * 120 FEE module) 4.6.3 - documentation: We are working on writing up all documents. We do not know what is the best way to post it. On the WEB? 4.6.4 - input access: 1) Most important thing is that we can get information very quickly(less than 1 s/calibration parameter set) 2) It would be nice to have dead channels appear in the event display with distinguishable color. I mean those information should be automatically available from event display. 3) fancy browser in which you can define columns and scope with searchability In AGS-E802, we were managing only one official database file and that was the only place we could try out databse programs. It turned out people try to input values into it on try and error bases. Which caused lots of garbage accumulated in the official data base and slowed down the access unnecessarily. It would be nice to have a database test area where people try to input their data to database to debug their database retrieval programs and their database values. Or it might be advantageous to have one person take care of the database retrieval programs and tailer them to each subsystem. Main database should be protected from unnecessary garbage which tends to accumulate if there's no flow control. 4.6.5 - retrieval access: It would be nice to have a program which shows the difference of the calibration parameters set by set graphically so that we can spot unusual behavior of the parameters. Mostlikely it is caused by fit failure. 4.7 - Central Tracking ---------------------- 4.7.1 - construction data: In Central Tracking we have: 1) 48 TEC chambers 2) 40 DC keystones 3) 32 PC chambers The kind of data we will have are A) Wire Tension measurements B) Wire Position measurements C) Wire Gain measurements D) Electronics Gain Measurements E) Electronics Noise Measurements F) Electronics Pedestal Measurements This information will amount to about 30 Mb of data. 4.7.2 - run-time data: We plan to monitor probably once per hour but we haven't yet studied, it is possible some quantity need to be updated more frequently (every 10-15 minutes) and others only once per 8 hour: - temperature, pressure, gas flow - HV, LV, Serial information downloaded into preamp or thresholds etc. - maybe alignement Also we will have to have a time history of the correspondance wire number- preamp channel #- FEM # - DCM # -power supply #-high voltage supply#-timing module #. We will also store the calibration data in the database and they will be the largest amount of data in the database. Assuming we need a calibration file every 30 minutes (TEC performance for example depends on accurate gain equalization) we might have to store 30 Gbytes of data in the database per year. 4.7.3 - documentation: I think we would like to keep most of the documents on the WWW, but it would be nice to have a database able to do a search using our defined keywords and returning URL addresses. Even Technical drawing could be translated into postscript or PDF and use a similar system. If we only store pointers it would be very little data (1Mb?) 4.7.4 - input access: We haven't quite made a decision on which database to use during construction most likely it will be Fox Pro. 4.7.5 - retrieval access: I think any database is a good tool only if it has an easy enough interface that most people will be able to use it. It would be nice if we could access the database information through the WEB especially the documentation part. The database used for online/offline calibration and serial information should have some performance requirements so it will never be the bottleneck while trying to analyze or even worse take data. Also we definitely want the data to be accessible from analysis programs. It is not clear to me, we should have a single database to store both the calibration and documentat information.