Theorarm is an Ogg Theora/Vorbis decoding library optimised for use on ARM processors. It is based on the latest (at time of writing) Theora decoder as supplied by xiph.org, and my Tremolo library (which is in turn based upon the Tremor decoder also supplied by xiph.org).
Theora and Vorbis are (supposedly) patent unemcumbered/royalty free video/audio compression schemes. Vorbis pretty much beats the pants off everything else out there (but of course the thing it gets compared to most is MP3). Theora is not quite as good as some of the alternatives out there (currently), but encoder optimisation is drastically closing the gap between it and its closest competitors. Everything you could possibly want to know about Ogg/Theora/Vorbis (and lots you don't) can again be found at xiph.org.
The standard Theora decoder as supplied, currently contains no ARM code whatsoever. Furthermore, it relies on various support libraries including libogg and libvorbis (available from the same source) to do ogg bitstream handling and vorbis decoding. Unfortunately libvorbis relies on floating point operation, which makes it a non-starter on the ARM platform.
The obvious solution to this would therefore seem to be to use the Tremor lib (the integer only vorbis decoder) with Theora, but this bundles its own versions of libogg and libvorbis within its code. These versions are not instantly compatible with the versions required by Theora.
For Theorarm, I have therefore made the Theora and Tremolo (my version of the Tremor lib) libraries compatible, and rewritten bits of Theora to work more efficiently on ARM machines. In some cases this involves tweaks to the C - in other cases, this means rewriting speed critical sections entirely in ARM code.
Options can be set in an an assembler header (lib/dec/common.s) to inform the compiler which type of code should be generated (ranging from vanilla ARMv4, thru ARMv4+LDRD/STRD, ARMv6 and NEON). The code will run on anything from ARMv4 upwards, but enabling the newer options gives an increase in performance. There is still scope for more optimisation here. The bit reading sections of the code assume a little endian memory system, but this can probably be changed if required.
The API to the library is broadly the same as before (with the differences being due to the different version of libogg present in Tremor).
The theora/tremor libs are supplied under a BSD license. See the exact license for details, but (AIUI) basically you can do what you want with it as long as you acknowledge that its their work.
Theorarm is not released under a BSD license, but under dual license terms.
First off, it's released under the GNU GPL. If you are prepared to abide by the GNU GPL's terms, then you can use this code under the standard terms of that license.
For those for whom a GNU GPL release is no good, I also release it under a home brew license of my own. The license terms are described below. I think they're pretty clear, but then IANAL, so if you want them clarified etc, feel free to rent me a lawyer for the afternoon.
Firstly, by "free" software I mean software that you do not derive any kind of revenue from. Shareware isn't free. Software that generates advertising revenue isn't free. Software bundled with other software/hardware that has to be paid for and can't be freely obtained elsewhere isn't free.
Free software does NOT need to give source away, but of course it can if it wants to.
Given this definition, if you want to use Theorarm as part of a free piece of software, please do. Obviously if you do give source away, then the Theorarm source should be clearly marked and all the distribution files (including this license) should be included unchanged.
If you want to use Theorarm (or a derivative of it) in any other type of system (be it software/hardware etc), then you need to contact me first.
I'm not looking to make my fortune here; Shareware programs with a reasonable fee, will almost certainly get permission to use the library as if they were 'free' code.
On the other hand, if someone wants to put Theorarm into a piece of hardware, I'd kinda like to make something for the time and effort I've put in.
Of course, it would just be nice to see it used - if you use Theorarm in a bit of software please do drop me a mail and let me know!
There are still some non standard Cisms in there (the tremor lib uses alloca, for example). VS2005 compiles and runs it fine though.
The assembly code is in ARM format, with a simple script to convert it automatically into gcc format at compile time.
You'd probably like some timings, to show how good Theorarm is, right?
My initial ARMv4 test device is an imate JAM running WinCE (that's a 416MHz XScale PXA272). My simple test app plays a 320x240, 25fps film, with 48KHz stereo audio (the interrogation chapter ripped from the R2 Matrix DVD and reencoded using the latest (at time of writing) Thusnelda encoder). Without post processing enabled, the code manages 38.5fps (i.e. comfortably full speed). With full post processing enabled, it only manages 23fps. (These figures are correct as of release 0.03)
I have done some limited, simplistic profiling, using a tool included in the source distribution. These figures should be taken as indicative, if possibly not 100% accurate.
When playing with post processing enabled, 28% of the time is spent in the YUV to RGB conversion code, 30% is spent in the (optional) post processing deblocker, and 9% is spent in the (optional) post processing dering code. 3.5% of the time is "unaccounted for" (presumably in system calls, such as reading data). Every routine that accounts for more than 1.5% of CPU time has been ARM coded.
Without the overhead of post processing, the figures are correspondingly changed, and the YUV to RGB conversion code becomes the dominating factor, accounting for 55% of CPU time. The largest other contributors are oc_frag_copy_list at 8.5%, oc_frag_recon_inter2 at 4% and the idct at 3% (all of which have been ARM coded).
I've now moved my primary testing platform to be a beagleboard (a Cortex-A8 based ARM development board) running at (I beleive 500MHz).
With post processing disabled, I can play a PAL DVD sized film (720x576x25fps, 48kHz stereo audio track) in realtime with software YUV2RGB. The limited profiling I've done, along with some back-of-an-envelope maths suggests that we should just about be able to do 720p films if the YUV2RGB process is done by hardware.
More details may be added here later as they become clear.
This should be considered a work in progress. There is lots more I'd like to do to the code, but I think the changes here are significant enough to make them available. Work on this project is likely to be punctuated by significant delays as real life gets in the way.
Obvious things to try next, are to continue investigating the use of ARMv6/NEON extensions to speed the hotspots and to remove the use of alloca.
The original Tremor lib included the following disclaimer:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The same applies to Theorarm.
| Related Pages: | |||
|---|---|---|---|
| Siryn | Tremolo | Theorarm | YUV 2 RGB |