Some Example Serializations of the Quantity ======================================= FOREWORD ======================================= The purpose of this document is to show some valid XML serializations for an IVOA quantity. Before presenting the examples, a few points should be made, as the XML serialization/examples can only go so far to describe what is intended, or may obscure important points/issues. 1. There are 3 types of quantity, "basic", "core" and "standard", and in that order are extensions of the prior type node structure. Extension is via accretion of XML components (whether child nodes or attributes). Thus, as components are optional/have sensible defaults, you should be able to "upcast" a more primitive type of quantity into a higher one. For example, the serialization of a coreQuantity describes both a coreQuantity, and a standard quantity (with some implicit defaults). Conversely, IF a "advanced" level quantity is fairly simple (such as a core quantity that held only one scalar number) then its representation in XML will look exactly like its parent realization, the basic quantity. The components of these types of quantity are given elsewhere, but summarized here, are: Basic Quantity [qid, name, description ] UCD CoordSystem Units DataType Value Accuracy Core Quantity [qid, name, description, size] UCD CoordSystem Units DataType Value | Values | Members AltValues Accuracy Std Quantity [qid, name, description, size ] UCD CoordSystem Units AxesList DataType Value | Values | Members AltValues Accuracy All quantities additionally have an XML id "qid" as well as name and description attributes. A special node, "refQuantity" can be used to reference any quantity (by id) at another point in an XML document. This mechanism is intended to allow to compression of an XML document where one or more quantities are repeated. The multi-dimensional quantities "Core" and "Standard" have a required attribute of "size" which indicates how many _values_ are contained within the quantity. 2. XML serializations of quantities may have the following valid XML node names (excepting extensions the user may make to the schema) with the corresponding XML schema types: XML Node Name XML Schema Complex Types allowed ------------- -------------------------------- basicQuantity basicQuantityType coreQuantity coreQuantityType stdQuantity stdQuantityType quantity stdQuantityType A user should be able to create XML schemata from these types. The new nodes may have limits placed on the quantities such that units, UCD, etc are limited to a preferred settings. For example, the "velocity" element constructed from a "stdQuantityType" may be limited to having units of type "cm/sec^-1" and a UCD for "velocity". The schema would give this "extended" quantity the node name "velocity" in order to identify it in the XML document. In all other aspects, it behaves as an unrestricted quantity. Examples on how to do this in XML schemata will be given in another document. 3. Within a type of quantity, there are a number of simplifying defaults which, for the largest fraction of data, will serve to compress the XML which need be passed; some possible simplifications are noted below. Thus, when some attributes are not specified, they are assumed to fall to some default. For example, if dataType isn't specified, it is assumed to be "a scalar string"; if "units" are not specified, they are assumed to be "unitless". To show a *possible* choice, the following examples *could* be considered equivalent: =and= = or even = (an XML note for newbies: the CDATA section needed to preserve spacing) where the string " my data" are two items of string data " my" and "data". In this case, we must assume that the length, since unspecified (no dataType node), may be gleaned from the "size" on the quantity and the spaces in the CDATA (where, if size is unspecified or 1, it's assumed the spaces *are* part of the string, and if size > 1 then the spaces between the non-space characters are NOT part of the value). 4. Units are represented as strings. There are a variety of standards out in the community right now which we dont want to get embroiled in. Thus, for the time being all examples just have filler "units" elements and the actual strings shouldn't be taken so seriously. 5. Where possible, the structure of the XML serialization should help control how users express their data so that it's "scientific"; BUT this is a NOT requirement/priority on the serialization. It becomes hard to impossible to formulate a serialization which is both richly expressive, has re-usable components AND is scientifically policing at the same time. Some police work, of course, can and should be done, but we should not lose sight of the fact that having a sufficiently rich serialization which can be transmitted and understood in less than the age of the universe is the priority here. 6. We allow 2 prescriptions for serialization of single scalar string meta-data in the model: either as PCDATA within a child node or as an attribute on the parent node. Why both? The first type is so that the user can control whitespace as well as allow for some re-use of the quantity model. The second case because when white-spacing isn't an issue, attributes allow for the user to have a '1-line' serialization which is more compact in form than using child nodes. Thus, given the advantages of each, we allow the user to have either (but *not both* on the *same* node). This means that the following are "equivalent" prescriptions: John Q. User ... Meta-data in the model which certainly "qualifies" for this dual treatment includes "name", "description", and "UCD". As "size" is an integer attribute, it does NOT have this mechanism. Furthermore, you are not allowed to use both prescriptions within the same element, e.g. Joe ... is clearly illegal by this rule. 7. Populating the data-cube (or matrix) of the quantity from the serialized "values" sections is assumed to be done in an implicit ordering. The rule is that the order of the axes in the serialization indicate the "fast" to "slow" axes. For example then, the following 2 dimensional matrix: A B C D E F is described by the vertical/horizontal i,j axes which have respective sizes "2" and "3". The serialization of this matrix and its data would be then: 0 1 2 0 1 A B C D E F where "j" (horizontal) axis is the 'fast' axis. Thus, the locations in the matrix are populated in [i,j] order of [0,0], [0,1], [0,2], [1,0], [1,1] and [1,2]. By reversing the order of the axes, we would get the data into the matrix as: A C E B D F Because data are stored as strings in PCDATA section, we are assuming that whitespace is unimportant, and as is the XML spec, any multiple whitespace will be collapsed into single whitespaces at parse time. For the purposes of the "implicit" rule, all whitespace indicates a delimiter between values. This rule presents an obvious problem of how to encapsulate multi- dimensional string data that has spaces/tabs/etc embedded within the value. Clearly, in order to treat this case, we must resort to some fixed width or alternative delimiter symbol rule for parsing the values, and this is not currently given. As an alternative delimited parsing rule, a good choice, for example, is to separate fields within the values via some XML tags, e.g. a 3 field string could be: whitespace doesn't matter here Or in this field either Clearly, CDATA sections may be used between these tags to preserve whitespace within fields, and the API will have to allow this control. It is also clear that this is not the most compact/efficient means of transporting (or parsing!) the values, so some sort of fixed width parsing solution is also desirable as an alternative, especially for passing along large datasets. Serializations presented here will need to have additional, explicit IO information in order to have the ability to wrap outside files (whether ASCII or binary) or to do fixed with parsing. We leave this for (near) future work. 8. Equivalence principle between single and multi-dimensional quantities: It may be desirable to create a serialization where something built in the multi-dimensional way would have "equivalence" in the atomic representation. For example, the matrix: A B C D which has a 2-d velocity, e.g. V(x,y), would look like in the multi-dimensional representation : cm/sec^-1 velocity:ucd 10.1 12.1 10.2 14.2 A B C D is really a compact description of 4 atomic (aka Basic) quantities, each with its own meta-data which are also atomic in nature, e.g. 10.1 10.2 cm/sec^-1 A 12.1 10.2 cm/sec^-1 B 10.1 14.2 cm/sec^-1 C 12.1 14.2 cm/sec^-1 D So what has happened in order to make the constituent atomic quantities is that the parent matrix quantity has been broken apart with axes quantities turned into meta-data in each atomic quantity. In principle, by analyzing the handful of atomic quantities, they could be reassembled into the matrix quantity again, if that is desired, but this is a relatively difficult task to do in general, and it is not required that parser of the serialization be able to do this "reverse" composition task. 9. Referencing mechanism. We will need to consider id/idref mechanisms for referencing one quantity within a document by another. In particular, the use of the reference mechanism in order to "compress" repeated information by quantities is highly desirable. ======================================= EXAMPLES ======================================= With these working assumptions, and with the agreements on the interface, and theoretical structure of the quantities, the following use-cases will be serialized to show examples of how we might do things. Summary of use-cases: 1. A single scalar value 2. A single vector value with 2 components 3. A single tuple value 4. A single object as data of a quantity 5. A list of string (scalar) values 6. A 2-d quantity 7. A 3-d quantity 8. A 2-d quantity described with alternative axes 9. A single scalar value with alternative value frame 10. A 2-d quantity of scalar values with alternative value frame 11. A 2-d quantity of vector values 12. A 2-d tabular quantity 1. Single Scalar Value This can, of course be handled by the basic quantity and above. To give an idea of compression/defaults the following examples are equivalent: Author name of Author John Q. Public =or= John Q. Public =or= name of Author John Q. Public 2. A single vector value with 2 components Depending on what we want to allow in DataType for basicQuantity, single vectors may be handled by all quantities. In this example, the "dataType" is "vector" which is a node which requires one or more "value frame" components to describe the vector. In the following example the velocity vector "10.34x, 12.81y" is described. Note: one issue with this approach: the parent quantity units can be different/in conflict with those of the child value frames. The solution is to put in the schema a rule that the units node doesn't appear when the dataType is of sub-type "vector". 10.34 1281. 3. A single tuple value Here, we have chosen to express the tuple as the fairly common celestial sky quantity. When information of the quantity is a "tuple" (association of one quantity to one or more other quantities), then the "values" + "dataType" nodes are replaced with a "members" node. This will prevent people from expressing tuples as obj1 + obj2 + number. Thus, the tuple prescription looks like: _whatever_appropriate_here _whatever_appropriate_here In order to make this example clear, let's show the referenced Ra/Dec quantities, so ra_ucd_in_here COORD_SYSTEM STUFF HERE hr:arcmin:arcsec hr arcmin arcsec 20 30 49.23 de_ucd_in_here COORD_SYSTEM STUFF HERE deg:min:sec deg min sec -89 30 21.22 4. A single object as the data of a quantity In this example, the member is any old object you like (rather than a quantity). This may be out of spec for the quantity just now, but nevertheless it's interesting to consider. (Note: as this example is conceived, you would need to extend the quantity parser in order to load the object serialization). XML binding may also be used to, at least, create a working "data" container (in other words, it will be an object with accessor methods cooresponding to the information presented, but no other methods that the class may have). If that extension isn't present, then based on the mode the quantity parser is running in, an warning or error might be thrown (or the aforementioned simple XML binding could be used). == XML serialization of the object in here == 5. A list of 3 string (scalar) values (values: "John ","Q. ", and "Public") In this example, showing how a one dimensional list with "ith" index may be done. This is certainly the realm of coreQuantity and above. We can access this quantity using q.getValue(i), where q.getValue(0) == "John ", and so on. Note: "size" defaults to "1", so we *must* specify it for this example. As previously noted, XML treats whitespace in a special way, and will collapse the whitespace in any "unprotected" string (e.g. not in CDATA section) to a single space per group of whitespace characters. As such, we must use the CDATA section below in order to preserve the characters of the original string. But this alone is not enough, as we also will need some additional specification (that currently doesn't exist) to direct how to parse this fixed width string. Instead of fixed length strings, we can use tagged values for variable length strings: John Q. Public A more sophisticated formulation, requiring the _standard quantity_ to do, we could actually declare the list w/ string index, e.g can also access this quantity by a value (tickmark) on the axis, ie q.values["first"] == "John " (or so on, as appropriate to the API) first middle last John Q. Public 6. A 2-d quantity This is in the purview of the standard quantity only. In the example below, a 3x3 matrix of numbers is shown, where the matrix is: 0 1 2 3 4 5 6 7 8 0 1 2 0 1 2 0 1 2 3 4 5 6 7 8 To make this an array, say for a CCD with pixel axes (where there may be an offset to the starting pixel value), we need then: count/sec^-1 0 1 2 0 1 2 10 11 12 1 2 3 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 But the first coordinate frame, with "i-index" and "j-index" are more or less extraneous now, as is the need to have "axesList" as there is only one coordinate frame, so it is equivalent to say: count/sec^-1 10 11 12 1 2 3 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 Since these x and y axes values can be generated by a simple algorithm, we can do count/sec^-1 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 This generalizes to larger arrays more easily. 7. A 3-d quantity This is a trivial extension of the 2-D case - simply add in more axis quantities to raise dimensionality. From the above, a 3-D cube (of extent 3x3x2) where the 2-D count rate data ordered in time looks like: count/sec^-1 10 11 12 1 2 3 TiME FRAME HERE sec 10.1 20.1 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 0.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 8. A 2-d quantity described with alternative axes The classic example here is a set of 2-D fluxes which are described by both pixel and sky coordinate frames. Adopting the example previously shown, we then have: erg/cm^-2/sec^-1 10 11 12 1 2 3 TiME FRAME HERE sec 10.1 20.1 deg RA-ucd-here as-appropriate 40.21 41.77 43.12 deg DE-ucd-here as-appropriate 40.21 41.77 43.12 TiME-FRAME-HERE sec 10.1 20.1 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 0.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 9. A single scalar value with alternative value frame Alternative value frames are only allowed under the standard quantity. Here is an example where countrates and flux values exist in a quantity as its data. Note: the children *must* be coreQuantities, standard quantities aren't allowed. count/cm^-2/sec^-1 10.12 erg/cm^-2/sec^-1 10.12 10. A 2-d quantity of scalar values with alternative value frame count/cm^-2/sec^-1 deg RA-ucd-here as-appropriate 40.21 41.77 deg DE-ucd-here as-appropriate 40.21 41.77 10.12 39.99 40.2 18.81 erg/cm^-2/sec^-1 190.12 139.99 349.10 80.1 11. A 2-d quantity of vector values Here's a fun one. This one is a quantity describing velocities at 4 points in x,y space. An issue: we need to be able to show that the velocity vectors are related to the positional axes, if desired. Perhaps the "coordSystem" node is appropriate using an id/idref mechanism?. m/s^-1 as-appropriate as-appropriate kpc as-appropriate 40.21 41.77 kpc as-appropriate 40.21 41.77 10.34 12.81 30.12 -45.13 60.2 0.02 76.3 43.4 Or, with mapping, erg cm^-2 sec^-1 TiME FRAME HERE sec POS.EQ J2000/ICRS deg 131.2181 -31.1284 512.1 512.1 -0.0016 0.0016 48.3121 UTC d 131281.4 -.00013 4.823 10.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 0.12 12.34 20.34 13.87 24.76 5.67 6.80 .7 12.8 12. A 2-d tabular quantity Another fun one. To make a table, we leverage the membership ability of quantities. Each column in the table is a child member quantity. Access by row is indicated by having the parent quantity have a common row axis specified that all child quantities refer to. An example serialization of a table using quantity. The data are: 1 Berkeley58 41 7 2 Florov,B.H.,Izv. 2 IC1805 354 135 -30 Vasilevskis.S.et al., 3 IC4665 275 87 -85 Sanders,W.L., 4 IC4756 464 166 74 Herzog,A.D.et al., 5 NGC129(A) 70 18 16 Lenham,A.P., 6 NGC1664 222 135 57 Kerridge,S.J.et al., 7 NGC1817 752 265 65 Tian K.P.et al., Annals of 8 NGC188 228 136 -3 Upgren,A.R., 9 NGC1912 998 172 87 Mills,G.A., Journal des 10 NGC2099(A) 243 216 81 Jefferys,W.H., taken from Catalog 1215, Zhao & Tian, "Tables of membership for 43 open clusters (1994 Version) (1995)". 01 1 2 3 4 5 6 7 8 9 10 Berkeley58 IC1805 IC4665 NGC129(A) NGC1664 NGC1817 NGC188 NGC1912 NGC2099(A) 41 354 275 464 70 222 752 228 998 243 7 135 87 166 18 135 265 136 172 216 deg 2 -30 -85 74 16 57 65 -3 87 81 Florov,B.H.,Izv. Vasilevskis.S.et al., Sanders,W.L., Herzog,A.D.et al., Lenham,A.P., Kerridge,S.J.et al., Tian K.P.et al., Annals of Upgren,A.R., Mills,G.A., Journal des Jefferys,W.H.,