OpenGL 2.0: Programmable, Scalable, And Extensible
OpenGL and DirectX represent a schism in the graphics world, one that has hampered development at times, and fostered it at others. There are new developments coming, and the schism appears to be closing somewhat.
Once upon a time, there was an honest attempt undertaken by Microsoft and SGI, the original keeper and nurturer of OpenGL, to bring OpenGL into DirectX. That was the ill-fated Fahrenheit project, which eventually ended in failure. That failure truly broke hearts in the industry. The great hope was to create a universal API that worked within and without Microsoft, that would enable graphics development to advance within a common framework. It seemed like an impossible task, and, it was. Microsoft has been accused of having a lack of will and SGI has been accused of fostering a crippling NIH (not invented here) environment. Whatever the reasons, and there are probably several considering all the fallible humans involved, Fahrenheit is just a memory, and Microsoft has put all of its development resources into making DirectX an API for high-end graphics, as well as game development.
Unlike OpenGL which as an open standard is updated and refined through a standards body, DirectX is developed to enhance the development of applications for Windows.
Without the need to work within standards body a and dedicated solely to making Windows look good, DirectX has blossomed. Meanwhile, OpenGL, maintained by the ARB, the architectural review board, has stagnated, with the only advances happening as OpenGL extensions. Among those extensions are graphics capabilities such as ClearCoat, Multisample and tools for integrating video and graphics (some of which have been developed through the OpenML effort, itself a sort of extension to the OpenGL ARB).
Extensions, unfortunately, have not proven to be a viable option for the majority of ISVs who want a consistent platform for development. Instead, a dizzying array of extensions has been developed, and many of them are designed for specialized applications. And the ARB has found itself unable to move OpenGL forward for several reasons, including the quicksand of IP discussions. The frustration comes from companies trying to participate in the ARB (design by committee) and offer technology, but at the same time, protect their IPs. The attempt to reconcile these conflicting iinterests has limited the ability of the ARB to consider new technology for OpenGL. Although the ARB has worked remarkably well over the years, the industry and the pace of consolidation have changed dramatically, putting new pressure on the ARB to keep up, let alone provide leadership.
In fact, the tables have turned. Less burdened by the OpenGL committee-based design approach, DirectX is providing a number of examples for what OpenGL might someday offer. The prime innovation is the promise of programmability included in an API, which, though quite a ways from being widely put to work by developers, is attractive to ISVs and IHVs alike. It means that graphics development can be taken as far as any creative developer wants to take it, and all the effects, looks and features will be accessible by mainstream hardware and software. It also offers that long-sought and oft-promised goal of “off-loading the host CPU.” The members of the OpenGL ARB, for the most part, companies also developing products for DirectX , clearly recognize the desirability of furthering OpenGL – the ability to deliver their products across hardware platforms.
OpenGL 2.0: Programmable, Scalable, And Extensible, Continued
Many of the members of the OpenGL ARB work with DirectX as well as OpenGL. They clearly recognizethe desirability of furthing OpenGL because it offers them the ability to deliver their products across hardware platforms.
But there’s a glitch in DirectX as well, and that is apparent in the evolution of DirectX 8: on the one hand, we have Nvidia’s DirectX 8 with programmable shader version 1.2, and then there’s ATI’s approach, which Microsoft has designated as DirectX 8.1 with programmable shader version 1.4. The two different approaches enable access to programmable hardware, but each uses different a register architecture. This is not a situation that’s inviting to ISVs, either. In theory, DirectX 9 is planned to solve the problem with an approach that is hardware-independent. But, notice the renaming of DirectX 10 as DirectX 9.1, demonstrating that the Microsoft path is not as straightforward as some would hope, either.
Further, with DirectX, Microsoft continues to prioritize the needs of the much higher-volume consumer market over the professional market that OpenGL has served well. Given the more conservative nature of most of the professional graphics application developers, an evolution of OpenGL has great appeal for the professional segment, even on Windows operating systems.
If you threw a rock at Siggraph last year, you might well have hit someone in the industry who would say that Nvidia made a very good run at stealing hardware programmability by convincing Microsoft to adopt its approach into DirectX 8 first. Nvidia was also proactive in educating developers on the use of these new programmable features. ATI opened up the debate with its own approach, and the problem became evident. The ISVs have chimed in now because they’re very aware of the danger of being limited to one hardware vendor for something as critical as graphics programmability. Furthermore, why stop at Windows?
The debate has carried over into OpenGL, where up to now the attempts at bringing programmability into the API have happened through the extensions procedure. This has contributed to the bewildering number of extensions, and the same issues of hardware-centricity were emerging and subsequently complicated by IP issues. (Over 230 OpenGL extensions have been defined. Nvidia’s extension documentation is 500+ pages, while the OpenGL 1.3 specification itself is only 284 pages.) So, OpenGL faces a number of critical issues.
Although today OpenGL can be implemented on a chip, lately the ARB has been working backwards and discussing which features from existing chips to standardize, hardly a positive dynamic for a forward-looking industry standard. OpenGL does not provide hardware independent access to the new programmable processors, so the current direction is to expose multiple hardware architectures through vendor-specific extensions. However, IP (intellectual property) threats have been holding up broader adoption of these extensions. The question being asked is: are IP issues causing a lack of progress or are they a symptom of a deeper problem?
An Answer Appears To Be At Hand, And The News Is Encouraging
At the ARB meeting held in September 2001, 3DLabs presented its vision for OpenGL 2.0. In the past, SGI was the de facto leader of the ARB and a bellwether for the next generation of extensions. With SGI wisely spending its resources on reorganizing the company, 3DLabs has taken a more aggressive role in the committee, and had laid the groundwork for OpenGL 2.0 with an ongoing education effort leading up to the meeting in September. 3DLabs’ Neil Trevett, senior vice president of business development, and John Schimpf, director of developer relations, prepared the group with a presentation at Siggraph, as well. By the time the ARB meeting rolled around, the 3DLabs contingent was met by an ARB which was much more willing and eager to get moving than anyone anticipated.
In fact, one of the first orders of business addressed by the ARB was the need to move beyond the ATI and Nvidia vertex shader extension debate. This was efficiently done as both companies huddled and worked out a strategy to develop an interim vertex shader. The process was both facilitated and devoutly wished for by Apple, which uses graphics technology from both Nvidia and ATI. Suddenly, the ARB looks like a forward-moving body with vision and goals.
The group accepted 3DLabs’ offer to develop an architecture for OpenGL 2.0, and they’ve plotted a clear course. As is probably already obvious, the primary and most immediate goal for OpenGL is to enable and exploit hardware programmability. However, OpenGL needs to be updated in other ways, as well. OpenGL was developed in 1992, when hardware was slower and memory management was not a high priority. Now the group is charged with bringing memory management to OpenGL in order to support advanced pixel processing. Also, OpenGL needs to compete with, or complement, DirectX in the area of texture compression, which the ARB calls pixel packing. Then, there is the very real requirement for OpenGL to look forward to embedded graphics. And there, once again, the game is quite different, since embedded graphics are, by definition, fixed, and the applications that use them will have specific and less-sophisticated requirements, at least at first. In fact, one of the most pressing requirements for embedded graphics will be hardwiring Playstation 1 functions into a chip.
Finally, the Khronos Group, the developers and promoters of OpenML, hope to bring it into the OpenGL fold to enable the development of products that combine video functionality with advanced graphics capabilities. Also, the OpenML group, which includes 3DLabs, Sun, Intel, Discreet, Evans & Sutherland, Pinnacle, RealViz and SGI, has ambitions for embedded devices. They hope that, by creating standards for multimedia on Internet devices, they can move beyond porting games to PDAs and enable real graphics, video and interactivity.
An Answer Appears To Be At Hand, And The News Is Encouraging, Continued
The most immediate step to creating an advanced OpenGL standard will be to combine backward compatibility with OpenGL 1.3, while maintaining parity with DirectX by incorporating required features such as vertex and pixel processing,and memory management.
In phase one, OpenGL will maintain complete backward compatibility with Open GL 1.3. In order to accomplish this, OpenGL 2.0 will consist of the existing functionality of OpenGL 1.3, combined with new functionality in a completely compatible way. This is illustrated in Figure 2. The advantage of this process is that it will also start streamlining the tangle of extensions that have grown up around the ARB’s stagnated progress. Also, the implementation of hardware programmability provides a much better path for integrating existing extensions.
The next step will be to synthesize a “Pure OpenGL 2.0” that provides a more streamlined API for developers. This will be accomplished by documenting certain OpenGL features as “legacy” features, and by educating developers to use the more flexible programmable features, rather than the fixed functionality features. This is shown in Figure 3.
3DLabs is already well on the way to defining the spec, and they’ve delivered OpenGL 2.0 white papers
The plan calls for keeping fixed functionality where flexibility is not required and where hardware implementation is the cheaper, more efficient approach. For example, such functions as frustum clipping, backface culling, viewport mapping, etc., can remain as fixed functions; also, frame buffer operations involving read/ modify/ write operations are well suited to hardware implementation. The goal is to extend programmability to all OS platforms and all hardware platforms.
Programmability
Programmability is the key word in OpenGL 2.0, and it means it is designed to be accessible to applications. As befits a standard for a wide range of applications and users, graphics programmability is added through a similar high-level language that is used for CPU-style programmability. It will offer a rich feature set, it will be hardware independent (as a standard should be), and it will be designed specifically for use within the OpenGL framework.
- Programmable vertex processing will be the most talked about feature. It will replace coordinate transformation, material application, and lighting, and allow arbitrary per-vertex operations.
- Programmable fragment processing is another key feature. It will replace texture access, texture application, and fog, and it will allow arbitrary per-fragment operations, something that developers have long wanted.
- Programmable image formats will replace fixed format image packing and unpacking operations, which will allow arbitrary combinations of type and format when sending pixel data to and from OpenGL.
The idea is to reduce the need for existing and future extensions by replacing complexity with programmability and provide rich, long-lasting functionality.
Features
The other functions being incorporated into the new standard include:
- Direct support for multi-pass algorithms under application control
- More flexible frame buffer configuration
- Off-screen rendering within OpenGL
- Application control of texture memory
- Unifying framework for OpenGL objects
- More flexible support for reading and writing image formats
- Cleaner and more efficient synchronization mechanisms
- Application control of buffer swaps
- Use of any color buffer as a texture
This results in a roadmap of OpenGL development and evolution as illustrated by the following chart:
The benefits of this new API will be the standardization of features and improved performance, along with the standardization of existing functionality. Most of the optional OpenGL 1.3 imaging subset is required in OpenGL 2.0, and numerous extensions will be incorporated into standard OpenGL, which will expose the full performance of the hardware. The result will be the ultimate in performance for transfers to and from the graphics subsystem, and more parallelism between CPU and graphics processing. This new API looks like the following diagram:
Functionality
Some of the big points of the new API include:
- Shading Language, a hardware-independent shading language for OpenGL 2.0 that is closely integrated with OpenGL 1.3. The existing state machine is augmented with programmable units that enable incremental replacement of OpenGL 1.3 fixed functionality. The new shader will provide automatic tracking of existing OpenGL state (e.g., make simple lighting changes without having to rewrite parameter management). It will be C-based, with comprehensive vector and matrix types, and will also integrate some Renderman features. This language will virtualize pipeline resources so that programmers, for the most part, won’t need to be concerned with resource management. There will also be the same language for vertex shaders and fragment shaders with some specialized built-in functions and data qualifiers.
- Vertex Processor capabilities for lighting, material and geometry flexibility. Vertex programs will replace parts of the OpenGL pipeline, such as: vertex transformation; normal transformation; normalization and rescaling; lighting; color material application; clamping of colors; texture coordinate generation; and texture coordinate transformation. However, the vertex shader does not replace the following: perspective projection and viewport mapping; frustum and user clipping; backface culling; primitive assembly; two-sided lighting selection; polymode processing; polygon offset; or polygon mode.
- Fragment Processor capabilities for texture access, interpolator and pixel operation flexibility. Open GL 2.0 has added fragment processor capabilities, which will replace the following: operations on interpolated vertex data; pixel zoom; texture access, scale and bias; texture application; color table lookup; fog; convolution; and the color matrix parts of the OpenGL pipeline. However, the fragment shader does not replace the following: OpenGL’s shading model; histogram; coverage; minmax; pixel ownership test; pixel packing and unpacking; scissor; stipple; alpha test; depth test; stencil test; alpha blending; logical ops; dithering; or plane masking.
- Pack/ unpack operation. The goal of the pack/ unpack operation is to convert “application pixels” to a coherent stream of pixel groups. Unpack storage modes are applied before data is presented to the unpack processor. The unpack processor is involved in application-to-OpenGL transfers and the pack processor is involved in OpenGL-to-application transfers, and neither is involved in copy operations. Copies within the graphics subsystem only use the fragment processor.
OpenGL’s existing “pixel transfer” operations are supported by the fragment processor, not the pack/ unpack processors. The fragment processor has the capabilities needed for scale, bias, lookup, convolution, etc. And since the ARB doesn’t want to require redundant hardware capabilities, the pack/ unpack processors do not need the floating point horsepower of the other programmable units. The primary operations are shift, mask, and convert to/from float – the kind of operations involved in application-to-OpenGL transfers of pixel data. Programs in the pack and unpack processors must be compatible with the current fragment shader and work in conjunction with the fragment processor in order to implement the OpenGL pixel pipeline. - Data Movement and Memory Managment. To enhance performance, data movement must be minimized. The primary types of data in visual processing are: vertex data (color, normal, position, user defined, etc), and image data (textures, images, pixel buffers). The general mechanism to create and manage OpenGL objects is to locate, bind, and manage objects through the same interface, and use vertex array, image, texture, shader, display list, and pixel buffer.
Currently, OpenGL memory management is a black box; i.e. everything is done automatically. As a result applications don’t know when an operation will happen, how long an operation takes, how much backing store is allocated and doesn’t have control over where objects are stored. Therefore, the current version of OpenGL doesn’t have control of when objects are copied, moved, deleted, or packed (defragmented). And it doesn’t know about the virtualization of memory resources. The end result is that OpenGL currently has a very limited ability to ‘query for space,’ and it can only do it for proxy textures. The following diagram illustrates the organization of the current OpenGL memory management.
OpenGL 2.0 will offer better memory management and will give applications control over the movement of data, providing better vertex manipulation, more efficient methods of getting data into OpenGL, direct access to OpenGL objects. In addition, the memory management features can eliminate copies of the data facilitating data streaming and greatly enhancing peformance.
Functionality, Continued
- With the ability to get control of memory management if they want it, applications will then be able to adopt and dynamically change a policy. However, the application developers will have to incorporate usage hints for each object and communicate to OpenGL what the application knows about the data usage pattern. This will result in a true, “use once, use many times, write only, bind data asynchronously.” It will also provide a unifying mechanism for all objects (e.g., textures, display lists, vertex arrays, images, pixel buffers and shaders). There’s no distinction between textures, display list, vertex array memory, etc., and this does not prevent OpenGL implementations from having these object specific pools. Alternatively, an application can still let OpenGL manage memory; it’s just another policy.
OpenGL 2.0 will use a pinned policy where the application gets control so that OpenGL will not move, copy, pack, or delete an object, once data is loaded. The object is pinned down in high performance graphics memory, which lets the application be responsible for storing or deleting objects, or for initiating a packing operation. The new organization is more efficient, as illustrated by the following diagram.
- Asynchronous OpenGL. By adding asynchronous operations to Open GL, OpenGL 2.0 will offer better parallelism and synchronization capabilities. OpenGL traditionally worries about how and where – not when. There has been a long-standing need for better mechanisms that know and control when things happen. Parallelism allows for work to continue, while something is finishing. For example, clear depth buffer while waiting for a swap.
OpenGL needs a generalized and unified synchronization mechanism that will eliminate the need for each extension to invent its own mechanism. This will: solve short-term problems such as parallel image download; enable more sophisticated use of timing control in the future; enable application to dictate the timing policy; control when things are going to happen; and allow parallel execution of longer OpenGL operations. These long operations can “run in the cracks” with respect to main rendering, which will provide better replacements for flush and finish.
The net result will be that OpenGL (2.0) will have generalized time control capabilities that will improve parallelism and control of operation timing. - GLsync. GLsync is a new OpenGL data type that will provide a unified synchronization mechanism. It can be viewed as a handle. Today, OpenGL provides allocate and de-allocate functions, but no application-managed ID space. In the new version, a thread can wait on a GLsync using OpenGL wait function. The graphics driver can unblock a thread waiting on a GLsync. Glsync can be used very broadly for things like: rendering fences; asynchronous data binding; event notification (e.g., vertical blank); triggered swap completion; and new things not thought of yet … It will also provide optional integration into underlying operating system synchronization primitive.
Rendering a fence synchronizes a thread to a stream of rendering commands; it’s a superset of the “Finish:” functionality. “Finish” requires stalls of host thread where no new rendering can be issued. OpenGL will be able to use Glsync for such notifications. A thread can issue fence, issue more commands, then wait on GLsync. A “wait” will unblock when the rendering preceding the fence is finished. This allows parallelism that Finish did not.
Several Additional Functions And Capabilities Are Being Added, Such As:
- FlushStream. The current specification allows too much freedom in keeping data buffered. Flush is over-used, and is pushing drivers to optimize it. A real flush is needed when an application is done issuing rendering commands. Buffered rendering should start in parallel with application doing something else, and before waiting on GLsync, need to ensure buffered commands will be processed. FlushStream solves these problems.
- Asynchronous Data Binding. This will provide functions return before data is copied, and allow for host parallelism during data binding. A thread can do something else while OpenGL gets data. This will prevent memory from being modified until OpenGL access completes, but it requires notification of when it is safe to modify memory.
- Background Stream. The existing asynchronous data binding still prevents parallel rendering. By providing a background stream, a separate stream can issue asynchronous bound commands into lower priority than normal rendering. This provides the ability to issue OpenGL commands into foreground stream, the normal command stream within a context today. Foreground stream is higher priority, and takes precedence over background stream.
- Vertical Blank Notification. Seemingly an obvious thing to have, vertical blank notification allows OpenGL to synchronize code with video output.
Right when it seemed darkest, the graphics community has proven themselves to be more committed to the cause of graphics than they are to the cause of their own platforms and products. Even Microsoft is showing an interest in extending the usefulness of OpenGL, and even more heartening is the willingness of major hardware vendors to play along by opening up the IP where it had earlier showed an inclination to put the clamp down. And both ATI and Nvidia have committed to making their technologies work together.
One of the key points stressed by the ARB is that the “open” needs to go back into OpenGL. The group has pledged that all ideas submitted for OpenGL, if adopted, are then open for use and not licensable as IP. Sure, there are two edges to this sword, but so far what we’re seeing is a recognition by the ARB that graphics, good-wonderful-shimmering-FX-laden-graphics, is pretty much the whole point, and a commitment to that is a commitment to success in graphics.