The intent is that most VDPAU functionality exists and operates identically across all possible Windowing Systems. This functionality is the Core API.
However, a small amount of functionality must be included that is tightly coupled to the underlying Windowing System. This functionality is the Window System Integration Layer. Possibly examples include:
malloc
, and filled via regular file or network read system calls. Alternatively, the application may mmap
a file.The compressed data is then processed using a VdpDecoder, which will decompress the field or frame, and write the result into a VdpVideoSurface. This action may require reading pixel data from some number of other VdpVideoSurface objects, depending on the type of compressed data and field/frame in question.
If the application wishes to display any form of OSD or user-interface, this must be created in a VdpOutputSurface.
This process begins with the creation of VdpBitmapSurface objects to contain the OSD/UI's static data, such as individual glyphs.
VdpOutputSurface rendering functionality may be used to composite together various VdpBitmapSurfaces and VdpOutputSurfaces, into another VdpOutputSurface "VdpOutputSurface".
Once video has been decoded, it must be post-processed. This involves various steps such as color space conversion, de-interlacing, and other video adjustments. This step is performed using an VdpVideoMixer object. This object can not only perform the aforementioned video post-processing, but also composite the video with a number of VdpOutputSurfaces, thus allowing complex user interfaces to be built. The final result is written into another VdpOutputSurface.
Note that at this point, the resultant VdpOutputSurface may be fed back through the above path, either using VdpOutputSurface rendering functionality, or as input to the VdpVideoMixer object.
Finally, the resultant VdpOutputSurface must be displayed on screen. This is the job of the VdpPresentationQueue object.
The key technology behind this is the use of function pointers and a "get proc address" style API for all entry points. Put another way, functions are not called directly via global symbols set up by the linker, but rather through pointers.
In practical terms, the Window System Integration Layer provides factory functions which not only create and return VdpDevice objects, but also a function pointer to a VdpGetProcAddress function, through which all entry point function pointers will be retrieved.
However, the above scheme does not work well in the context of separated Core API and Window System Integration Layer. In this scenario, one would require a separate wrapper library per Window System, since each Window System would have a different function name and prototype for the main factory function. If an application then wanted to be Window System agnostic (making final determination at run-time via some form of plugin), it may then need to link against two wrapper libraries, which would cause conflicts for all symbols other than the main factory function.
Another disadvantage of the wrapper library approach is the extra level of function call required; the wrapper library would internally implement the existing "get proc address" and "function pointer" style dispatch anyway. Exposing this directly to the application is slightly more efficient.
Note, however, that this simply guarantees that internal VDPAU state will not be corrupted by thread usage, and that crashes and deadlocks will not occur. Completely arbitrary thread usage may not generate the results that an application desires. In particular, care must be taken when multiple threads are performing operations on the same VDPAU objects.
VDPAU implementations guarantee correct flow of surface content through the rendering pipeline, but only when function calls that read from or write to a surface return to the caller prior to any thread calling any other function(s) that read from or write to the surface. Invoking multiple reads from a surface in parallel is OK.
Note that this restriction is placed upon VDPAU function invocations, and specifically not upon any back-end hardware's physical rendering operations. VDPAU implementations are expected to internally synchronize such hardware operations.
In a single-threaded application, the above restriction comes naturally; each function call completes before it is possible to begin a new function call.
In a multi-threaded application, threads may need to be synchronized. For example, consider the situation where:
In this case, the threads must synchronize to ensure that thread 1's call to VdpDecoderRender has returned prior to thread 2's call(s) to VdpVideoMixerRender that use that specific surface. This could be achieved using the following pseudo-code:
Queue<VdpVideoSurface> q_full_surfaces; Queue<VdpVideoSurface> q_empty_surfaces; thread_1() { for (;;) { VdpVideoSurface s = q_empty_surfaces.get(); // Parse compressed stream here VdpDecoderRender(s, ...); q_full_surfaces.put(s); } } // This would need to be more complex if // VdpVideoMixerRender were to be provided with more // than one field/frame at a time. thread_1() { for (;;) { // Possibly, other rendering operations to mixer // layer surfaces here. VdpOutputSurface t = ...; VdpPresentationQueueBlockUntilSurfaceIdle(t); VdpVideoSurface s = q_full_surfaces.get(); VdpVideoMixerRender(s, t, ...); q_empty_surfaces.put(s); // Possibly, other rendering operations to "t" here VdpPresentationQueueDisplay(t, ...); } }
Finally, note that VDPAU makes no guarantees regarding any level of parallelism in any given implementation. Put another way, use of multi-threading is not guaranteed to yield any performance gain, and in theory could even slightly reduce performance due to threading/synchronization overhead.
However, the intent of the threading requirements is to allow for e.g. video decoding and video mixer operations to proceed in parallel in hardware. Given a (presumably multi-threaded) application that kept each portion of the hardware busy, this would yield a performance increase.
By established convention in the 3D graphics world, RGBA data is defined to be an array of 32-bit pixels containing packed RGBA components, not as an array of bytes or interleaved RGBA components. VDPAU follows this convention. As such, applications are expected to access such surfaces as arrays of 32-bit components (i.e. using a 32-bit pointer), and not as interleaved arrays of 8-bit components (i.e. using an 8-bit pointer.) Deviation from this convention will lead to endianness issues, unless appropriate care is taken.
The same convention is followed for some packed YCbCr formats such as VDP_YCBCR_FORMAT_Y8U8V8A8; i.e. they are considered arrays of 32-bit pixels, and hence should be accessed as such.
For YCbCr formats with chroma decimation and/or planar formats, however, this convention is awkward. Therefore, formats such as VDP_YCBCR_FORMAT_NV12 are defined as arrays of (potentially interleaved) byte-sized components. Hence, applications should manipulate such data 8-bits at a time, using 8-bit pointers.
Note that one common usage for the input/output of Put/GetBits APIs is file I/O. Typical file I/O APIs treat all memory as a simple array of 8-bit values. This violates the rule requiring surface data to be accessed in its true native format. As such, applications may be required to solve endianness issues. Possible solutions include:
Note: Complete details regarding each surface format's precise pixel layout is included with the documentation of each surface type. For example, see VDP_RGBA_FORMAT_B8G8R8A8.
Depending on the exact encoding structure of the compressed video stream, the application may need to call VdpDecoderRender twice to fill a single VdpVideoSurface. When the stream contains an encoded progressive frame, or a "frame coded" interlaced field-pair, a single VdpDecoderRender call will fill the entire surface. When the stream contains separately encoded interlaced fields, two VdpDecoderRender calls will be required; one for the top field, and one for the bottom field.
Implementation note: When VdpDecoderRender renders an interlaced field, this operation must not disturb the content of the other field in the surface.
Note that it is entirely possible, in general, for any of the VdpVideoMixer post-processing steps to require access to multiple input fields/frames.
It is legal for an application not to provide some or all of the surfaces other than the "current" surface. Note that this may cause degraded operation of the VdpVideoMixer algorithms. However, this may be required in the case of temporary file or network read errors, decode errors, etc.
When an application chooses not to provide a particular surface to VdpVideoMixerRender, then this "slot" in the surface list must be filled with the special value VDP_INVALID_HANDLE, to explicitly indicate that the picture is missing; do not simply shuffle other surfaces together to fill in the gap.
The VdpVideoMixerRender parameter current_picture_structure applies to video_surface_current. The picture structure for the other surfaces will be automatically derived from that for the current picture as detailed below.
If current_picture_structure is VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME, then all surfaces are assumed to be frames. Otherwise, the picture structure is assumed to alternate between top and bottom field, anchored against current_picture_structure and video_surface_current.
Weave de-interlacing may be obtained by giving the video mixer a surface containing two interlaced fields, but informing the VdpVideoMixer that the surface has VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME.
Bob de-interlacing is the default for interlaced content. More advanced de-interlacing techniques may be available, depending on the implementation. Such features need to be requested when creating the VdpVideoMixer, and subsequently enabled.
If the source material is marked progressive, two options are available for VdpVideoMixerRender usage:
If the source material is marked interlaced, the decoded interlaced fields should always be marked as fields when processing them with the mixer. Some de-interlacing algorithm is then always applied. Inverse telecine may be useful in cases where some portions, or all of, the interlaced stream is telecined film.
When modifying VDPAU, existing enumeration constants must continue to exist (although they may be deprecated), and do so in the existing order.
The above discussion naturally applies to "manually" defined enumerations, using pre-processor macros, too.
New structures may be created, together with new API entry points or feature/attribute/parameter values, to expose new functionality.
A few structures are considered plausible candidates for future extension. Such structures include a version number as the first field, indicating the exact layout of the client-provided data. Such structures may only be modified by adding new fields to the end of the structure, so that the original structure definition is completely compatible with a leading subset of fields of the extended structure.
New functions may be added at will. Note the enumeration requirements when modifying the enumeration that defines the list of entry points.