While common, animation is not always done before sound. There are certainly other workflows. Still, I do agree that keying volume/panning is useful.
Audio keys can overlap, eg:
Loading Image
This means individual volume control of each key isn't possible if a single volume timeline were provided for each audio event. The plan is that each audio event key has a curve, not for interpolation between keys, but to represent the volume over the audio clip's duration. A separate curve is also needed for panning. Likely these appear in the tree properties for each audio event key.
A callback for volume changes isn't great. Likely the implementation will be that an audio Event which triggered the start of playback can be queried for the volume or panning at any playback position, eg "what is the volume at 25% of the audio clip duration?". This polling mechanism keeps things simple while still providing all the functionality needed for runtimes.
Keep in mind that many game toolkits have relatively poor audio support. It may have high latency and/or have a limited API that prevents functionality the Spine editor provides. Your game toolkit capabilities should be considered before relying volume/panning features the editor provides.
To follow up, 3.7.29-beta has volume and balance (the correct term for stereo) on events, both in setup mode and keyed (to initiate playing an audio file). However, the volume and balance is constant for the duration of the playback for each audio event key, it is not possible to change their values over time. To do that we need to allow a curve to be defined for each audio event key. 3.7 is dragging on too long already, so we've had to push that functionality to a future release.