glDrawElements is the actual call that you want to minimize. That's when the draw is submitted to the gpu. The
for(id<CCRenderCommand> command in _queue)
line is when the Cocos2d draw calls are aggregated into GL calls.
(Without looking) I suspect that the in the command_queue aggregation it checks draw states and if the same, appends the geometry to the current buffer. If the buffer overflows or the drawstate changes, then it 'flushes' the current batch and starts a new one.
This design is extremely common in the advanced rendersystems and I would say is 3/4 on the way to multithreaded. In a multithreaded system (such as doom3) there would be 2 command_queues that receive CCRenderCommands, one actively receiving and a second being submitted to the GPU.
I've looked over the code for SkeletonRender::draw and compared it to the Cocos2d docs. Everything is setup correctly to automatically utilize batching. Perhaps you are looking at the CCRenderCommand Draw counter and not the GL draw counter. If you're debugging on a device, use XCode and performance analyzer instruments.
If you're still concerned, the Cocos2d-x version of SkeletonRenderer uses a PolygonBatch class that aggregates the geometry before being submitted to CCRenderer. This would reduce the CCRenderer draw calls to 1/10th and as long as everything is on the same texture you should still only get a single glDrawElements call. PolygonBatch will essentially be doing the same thing as CCRenderCommnad_queue and really it boils down to who is faster at the job. Since SkeletonRenderer has an informational advantage it could go either way.
In any event, use XCode profiling to determine if there's enough evidence to support optimization.