I am astoundingly disappointed with the new XML API Registry.
I’m not going to pretend that I represent more than a tiny fraction of consumers of the OpenGL registry, but I think we could have been accommodated in the transition much better than this. A bit of background would be important, I think:
I have a hobby of writing bindings for C libraries to higher-level languages, usually Lua. I take pride in the quality and “intelligence” of my bindings, they are never 1:1 mappings to the C functions and arguments but are tailored to the capabilities of the language in question. A good portion of my work involves writing parsing code that goes over the declarations of the interface of what I’m binding in order to automatically generate a large portion of the bindings, usually in Lua or, increasingly, PowerShell (that is, regardless of the target-language, the parsing and code-generation bits are usually written in Lua or PowerShell).
Much of what goes into writing a good parser like that involves programming it to know when it can automatically generate a good binding, and when to alert the programmer that it requires more information, additional annotations, if you will. Most of OpenGL requires only some annotations on types in order to auto-generate good bindings to most of the functions…only a few keys ones require manual intervention.
A while back, I started on a project to bind OpenGL to Lua, in two versions, a Legacy binding and a Modern binding. The OpenGL specfiles are…were…a godsend, enabling large portions of this to be done automatically in high quality.
The specfiles are rich in semantic information that’s useless to a language like C but truly important to higher-level languages. As the readme file in the XML API SVN says, much of it wasn’t perfect or easy to translate, but that’s okay…it was important and useful anyway. Strong enumerations are a critical feature of generating good high-level bindings, and the specfiles supported that notion. In/Out annotations? By-value, by-reference, by-array annotations? All good. Even the [COMPSIZE(…)] annotations, which the readme files admit, aren’t machine-translateable…but they are a critically-important annotation for projects like this nonetheless! All of them allow a parser to notify the programmer that additional information is required…and often, the information can be added in annotations in one place to supply the needs necessary to machine-generate code in multiple places!
I don’t see why much of this information couldn’t have been added to the new XML database as optional information, recommendations, ANNOTATIONS, if you will. Moreover, what information was retained was translated into such a horrible format that it’s even harder to parse than it was before! The new XML format’s declarations for commands are little more than marked-up C declarations, absent of any information not used in C…which is fine for C programmers, but then, C programmers don’t need to write parsers to generate C headers from the XML API database, because OpenGL provides a Python script for that, AND the resulting C headers!
Perhaps, it will help if I compare-and-contrast, from a parser’s point of view, the kind of information the old specfiles and the new XML database provide:
CallLists(n, type, lists)
return void
param n SizeI in value
param type ListNameType in value
param lists Void in array [COMPSIZE(n/type)]
category VERSION_1_0_DEPRECATED # old: display-list
glxflags client-handcode server-handcode
version 1.0
deprecated 3.1
glxropcode 2
offset 3
What can we see here?
- We know the name of the function is ‘CallLists’ and we don’t have to scan for a “gl” or “wgl” or “glx” to automatically remove, that has no need to be there.
- We know the return-type is ‘void’, as it directly says what the return is, we don’t need to write a partial C parser just to get that info.
- Parameter ‘n’ takes a SizeI, by-value, as an input (we’ll get back to ‘n’ in a moment.)
** SizeI is a semantically-distinct type, representing a…well…a size, and project-specific human annotations relating to the data in the gl.tm file can tell us what kind of argument-verification code to generate for our target language - we can have specific code for verifying SizeI-type arguments. - Parameter ‘type’ is a ListNameType, by-value, as an input. From parsing the enumfiles, we know that ListNameType is an enumerated type, and our target language handles those differently…moreover, from parsing the enumfiles, we know at least a superset of allowable values for those, and in addition to checking glGetError() afterwards for GL_INVALID_ENUM, we can do our own checking ahead of time. We’ll also be able to generate a more intelligent error message that says which parameter had an invalid enumeration value (some functions have multiple enumeration parameters) and we’ll be able to generate a list of possible valid values. (I’ll return to enumeration types in a bit).
- Parameter ‘lists’, most importantly, tells us that it’s not just taking a pointer…it’s taking as input, an array of types. Moreover, the annotation [COMPSIZE(n/type)] not only allows us to alert the programmer that manual intervention is required, but in this case, depending on the type of target language, we can even possibly not require much manual intervention at all. COMPSIZE tells us that both parameter ‘n’ and parameter ‘type’ merely describe the values being passed into ‘lists’. If your target language has sufficient type-data attached to the values you pass in to ‘lists’, then both ‘n’ and ‘type’ can be inferred from the ‘lists’ value itself, and a sufficiently-intelligent parser can determine this.
- Lastly, the ‘category’, ‘version’, and ‘deprecated’ values…I can’t speak for others, but for me, they’ve come in useful for generating lists of what I want to bind. I filter by them to pare down what I want to work with.
Now, let’s look at the equivalent declaration in the new XML API:
<command>
<proto>void <name>glCallLists</name></proto>
<param><ptype>GLsizei</ptype> <name>n</name></param>
<param><ptype>GLenum</ptype> <name>type</name></param>
<param>const <ptype>GLvoid</ptype> *<name>lists</name></param>
<glx type=“render” opcode=“2” />
What can we see here?
- With some text parsing, we can see that the return type is ‘void’, and the command name is ‘glCallLists’. We will have to write some code to remove the ‘gl’ from here.
- There is a parameter named ‘n’, taking a parameter ‘GLsizei’. This tells us the representation of the input value in C, but doesn’t give us any more semantic information besides the largest possible range of representable values.
- There is a parameter named ‘type’, taking a parameter ‘GLenum’. Wow, that’s just pitiful…the only checking we can do is that our target language passed some kind of enumeration value but we can’t make even the most basic checks to see if it’s the right kind of enumeration value. Strongly-typed enumerations? What are those?
- We have a parameter named ‘lists’. It is of type ‘GLvoid’. Our parser is then expected to parse out the presence of a ‘const’ before the type and a ‘*’ after the type to determine…pretty much nothing. We’ll have to alert the programmer that more information is required but we can’t even give the most basic of annotations as to what might be required. Moreover, we don’t even know that ‘n’ and ‘type’ parameters are merely descriptive of what’s being passed into ‘lists’…our programmer will have to notice to insert additional annotations to infer the data from them and not generate them as actual parameters of the equivalent function in our target language.
And that’s it. We don’t get any more information than that. This information is only useful to someone generating C headers…all information that would’ve come in useful for generating bindings to higher-level languages has been lost. And why? Did the information really need to be lost? Was it beyond the capabilities of XML to handle?
Moving on to Enumerations…
I said I’d return to COMPSIZE() and parameter ‘type’. I’ll admit that the specfiles are marginally difficult to parse…and yet it took less than a day for me to write PowerShell code to go through all of them, generate enumerations, along with properties on them for which definitions they were re-used from, what class they were from, verify everything, and generate warnings and errors for multiple-definitions. So let’s look at ListNameType in the specfiles:
ListNameType enum:
use DataType BYTE
use DataType UNSIGNED_BYTE
use DataType SHORT
use DataType UNSIGNED_SHORT
use DataType INT
use DataType UNSIGNED_INT
use DataType FLOAT
use DataType 2_BYTES
use DataType 3_BYTES
use DataType 4_BYTES
Some quick searching shows that this comprises the entire definition of ListNameType, there are no additions to this definition elsewhere in the specfiles. We know exactly which values are allowed, and we even know their names without the GL_ pseudo-namespace attached. We can look up their definitions in the DataType class, which is a convenient location for our programmer to add some trivial annotations giving the size of the various represented data types, which our parser and code generator can make use of…as well in any other enumeration type that borrows values from DataType, such as NormalPointerType or PixelType.
Now let’s look at the comparable data in the new XML API database…oh wait, we can’t. In fact, the phrase “ListNameType” doesn’t show up anywhere in it.
I think I’ve made my points clear…to close, I’ll reiterate by saying that I’m aware I represent an extraordinarily small group of people, but I still fail to see how anyone thought this bit from README.PDF could’ve been much comfort to anyone:
It would be a big job to go backwards from the XML to .spec formats, and
we don’t want to support this or enhance the .spec files going forward. Hope-
fully, people using the .spec files for other purposes will be able to transition
to the XML registry.
It makes me wonder at this point who thinks that transitioning from XML to .spec would be going backwards. The only advantage at this point that the XML data has is that the overall structure of the data has ready-made parsers available…but the data itself, what really matters, has only gone backwards. So much data is missing, and what data remains has actually become harder to parse…surely nobody could believe the old specfile format was harder to parse than full C declarations? And why was so much semantic data thrown away?
At this point, I’ll continue to use the old specfiles to generate bindings up through OpenGL 4.3, because I can’t use the new XML registry for it - it doesn’t have the information for it. I’m quite worried for the future of my project and how it’s going to function when I want it to target ever-newer versions.