I did some work on the GPU build of Flixel and sent the changes to the author of the Github repo, which he mentioned he'd be updating with some of my work. I was able to get a working version that fully utilizes GPU rendering, which is still in development but is promising. I also made some later modifications that enable a interpolation toggle.
Other than that, getting the hardware boost with "direct" is one of the best things you can do. Oh, and also fixing the FlxRect class to pool objects; there's a major cumulative FlxRect instance leak that causes the GC to run way more than it needs to, causing small lag spikes. I springboarded off of the ObjectPool code by moly to get a fix working.