In Part I we looked at the differences between DTS:X and Atmos as far as the layout goes. In Part II, we will be looking at the differences between the upmixing modes of each decoder.
A Brief History of Surround Upmixing, From Dolby Surround to Dolby Surround.
Back in the early days of surround sound, before discrete digital surround sound, Dolby came up with Dolby Surround (the original version, confused yet?). It consisted of two tracks mixed as stereo called Right Total and Left Total, containing the left and right tracks, a center channel mixed at -3dB, and the surround channel mixed at -3dB and out of phase with the right, left, and center. The original Dolby Surround decoder derived a single mono surround channel (which was played back through an array of speakers in the cinema two speakers in the home) by extracting the out of phase information and routing it to the rear. Unfortunately, there was only about 4dB of separation between the surround channel and the front channels, so a low pass filter of 7khz was used to sort of “tone down” the surrounds. This obviously left much to be desired. In 1987, the decoder was improved upon and renamed Pro Logic. In addition to the left, right, and surround channel, a center channel was added, giving a total of 4 channels. Accuracy was improved by using logic steering (hence the name), which operated by raising or lowering the volume of other channels depending on the dominant signal, for example, a strong mono signal would trigger the decoder to lower the volume of the left and right. In addition to logic steering, the surround channel was delayed so that any sounds leaking into the surrounds from the front channels arrived later, helping to decorrelate the channels. This increased the channel separation achievable by about 30dB. Dolby Pro Logic could technically be used on regular stereo content, but the results were poor, with too much content collapsing into the center channel, this was especially problematic for music.
A new solution was needed. Enter Dolby Pro Logic II. In the early 2000s, PLII was introduced as an improved version of the original PL. PLII had the ability to decode 5 independent full range channels from any stereo content using a negative feedback circuit. The sum of the left and right channels was mixed into the center, and the spatial timing differences were routed into the surrounds. This worked extremely well with any content, providing a simulated surround experience that was nearly as accurate as a discrete mix. PLII was improved once again (renamed PLIIx) by adding the ability to decode two rear channels in addition to the surround channels, by further processing the spatial differences between the surrounds. Around the same time, DTS released their own upmixing solution called Neo:6. Like PLII and IIx, Neo:6 could decode 6 full range channels (a mono rear channel that could either be routed to a single speaker or pair) from stereo or 5.1 channel content. Unlike PLIIx, Neo:6 was capable of steering multiple sounds into different channels at once, by first splitting each channel into up to 19 frequency bands. So long as a sound lied within a different band, it could be steered independently from other sounds. This offered increase channel separation and improved accuracy.
Around 2009, both Dolby and DTS added the ability to extract height information, renaming the decoders PLIIz and Neo:X. PLIIz extracted height information by routing spatially diffuse sounds from the surrounds to a single pair of height channels mounted on the front wall, Neo:X. Unlike PLIIz, Neo:X had the ability to decode up to 11 channels, either two pairs of height channels mounted on the front and rear wall, or front height and front width channels. Similarly to PLIIz, Neo:X extracted diffuse information from either the front or surround (depending on the configuration) channels, while deriving left and right width channels from the sum and difference of the front and surrounds. Audyssey also provided front height and width channels via Dynamic Surround Expansion (DSX), however, unlike Neo:X and PLIIz, no steering logic was involved. DSX functioned by increasing the apparent width and height of the sound stage via ideal early reflection simulation.
With the release of Atmos, (and later, DTS:X), both decoders came with entirely redesigned upmixing solutions, Dolby Surround and DTS Neural:X, both of which are able to fully decode as many extra channels as your AVR can process. So what’s different about Dolby Surround and Neural:X? Which one works better with 5.1/7.1 content?
Dolby Surround is significantly more advanced than PLII/x/z. PLII decoders utilized a broadband decoder, which could only steer single sound sources, Dolby Surround now uses a multiband decoder, similar to Neo:6, and is therefore able to steer multiple independent sounds. Dolby surround first takes each pair of channels (front, surround, and rear) and processes them separately. If the source is stereo, it first decodes a 5.1/7.1 mix from the stereo source and then further processes this to derive height channels. Each channel pair is first divided by frequency bands to allow independent processing of individual sounds, then, each band is processed in the timing domain, direct sounds with timing differences can be steered into the bed channels, while diffuse/decorrelated sound can be steered into the height channels. This offers extremely good separation of channels and highly accurate placement of sounds.
DTS: Neural X is said to be a completely new design, independent from Neo: 6/X. There’s not a whole lot of detail on how it works exactly, but it’s said to be a spatial remapping algorithm, which uses the knowledge of the sound position and the speaker locations to properly route sounds to the correct speakers. How vague and unhelpful.
More information and discussion on the two upmixers can be found here.
Neural: X vs Dolby Surround, Subjective Comparisons.
Performance with multichannel content:
To start off, I used Star Wars: The Force Awakens. The first demo was done using a 5.1.2 top middle configuration, and then done using a front height configuration later on. Obviously, the movie has lots of spaceship flyovers, and the soundtrack is modern enough to really let the decoders take advantage of the spatial cues. A second demo was done with The Dark Knight Rises. The Force Awakens is a DTS HD-MA mix, and The Dark Knight Rises is a DTS HD-MA 5.1 mix. I also have the a Vudu copy of The Force Awakens which is a 7.1 Dolby Digital Plus mix, which is what I used for Dolby Surround. The way the Neural X handles a 5.1 mix vs a 7.1 mix is important, and will be explained further on down.
Star Wars: The Forces Awakens DTS HD-MA 7.1
The first demo was done using a Top Middle Configuration. The first thing I noticed is that Neural X seems to be a lot more “discrete” in it’s approach to placing things into the height channels, Unlike Dolby Surround, which tends to elevate the entire sound stage, the majority of the sound remains in the bed channels. Things like music, the reverberation of a large hall, or birds chirping never find their way into the heights. Only distinct, single sounds, such as a ship flying directly overhead find are pushed into the heights.
The scene that really caught my attention was the Millennium Falcon taking off from tatooine.
As Rey and Finn are running towards the ship, tie fighters are blasting away at the ground, sending rocks and sand flying into the air, as they approach the ship, the airborne rocks can be heard hitting the top of the ship overhead. As the ship takes off, sounds are so accurately placed into the heights, it’s as if it was actually encoded with height data. Throughout the rest of the movie, Neural:X would every so often grab a distinct sound and move it to the heights with extreme precision, but the rest of the time it just sat there and did nothing. I often felt like there were a lot of sounds that should have been remapped that weren’t, such as rain, wind, birds, and echos. Changing the speakers to a front height configuration changed where the upmixer pulled sounds from, with less of a focus on directly overhead sounds, placing height sounds instead from the front L/R into the height channels.
The Dark Knight Rises, DTS HD-MA 5.1
I noticed quite a bit more overhead sound in this one, including the missing environmental sounds in The Force Awakens. Sounds such as Batman swooping down from above, and the Bat aircraft flying overhead were moved into the height channels. As before, Neural:X was much more direct with it’s approach than Dolby Surround. There’s not much to say than was already said in the Star Wars demo, outside of more ambiance being pushed upwards, the experience was fairly similar. It appears Neural X functions differently with a 5.1 mix vs a 7.1 mix, with it being more aggressive in placing sound overhead with 5.1 mixes. The reason for this will be discussed further on down.
For this demo, I used the Vudu version of The Force Awakens, since it’s encoded in 7.1 Dolby Digital Plus. My receiver does not allow me to cross mix between codecs, the only way to choose either one is to use a PCM mix, at the time of the demos, I did not have a BDP capable of decoding audio into multichannel PCM.
With Dolby Surround, I noticed a lot more coming from the heights. The multichannel musical score was pushed upwards, giving the full image of a large concert hall. Wind, rain, birds, and pretty much anything that could possibly be over head was pushed up into the heights. Overall, it made the sound stage feel massive, though sometimes I did feel like it was too much, especially with the musical score. It was a bit more difficult to tell what was coming from overhead during action scenes with the orchestra blaring, adding the score to the heights made the music a bit louder than what it would have been without the heights. Unlike Neural:X, sounds from above were less “focused”, while pretty much everything Neural X pushed to the heights was similarly pushed with DSU, the difference was how discrete it sounded. For one, direct sounds, such as the rocks falling on the Millenium Falcon, weren’t quite as loud with DSU, despite the fact the overall loudness of the height channels was greater due to more sounds being routed to them, nor were they as focused. With Neural X, the falling rocks appear only in the height channels, while DSU seems to spread some of the sound in the surrounds. This is both good and bad, depending on the content being upmixed. With wind and rain, this offers an exceptionally three dimensional sound field, with single origin sounds, such as the falling rocks, it makes it harder to pinpoint as coming from above.
Since beginning this post, I have demoed multiple movies using both upmixers and the results are nearly the same regardless of the movie. DSU seems to really expand the whole sound stage, while Neural X focuses more on individual sounds. It’s hard to say which one I prefer better, since they both have their strengths and weaknesses. DSU does a much better job with environmental sounds, such as birds chirping, and rain and wind. It also does a better job of portraying the size of the sound stage, a large room with lots of reverb sounds huge, whereas Neural X tends to ignore this. Neural X on the other hand does such a good job of remapping pinpoint sounds that it sounds almost as good as discrete Atmos.
Performance with 2ch Content
To test the performance with two channel content, I used both movies and music. For movies, I pulled the blurays off into an MKV, pulled the audio tracks out, opened them up in Reaper, created a regular stereo downmix, and then muxed the 2ch PCM file into the mkv. This would allow me to switch back and forth between the multichannel mix and a two channel mix.
Naturally, I chose the force awakens once again. With Dolby Surround, it was really difficult to tell the difference between the multichannel mix and the 2ch upmix. The only minor difference being that some sounds originally mixed into the front left and right found their way into the surrounds, most of these being blaster shots or lightsaber noises. DSU impressively preserved the integrity of the original surround mix, objects panning from the front channels to the rear channels in the original surround mix followed almost exactly the same pattern in the upmix. Unlike PLII, there was absolutely no channel bleed whatsoever. With PLII, things like dialogue, which was matrixed to the center, could be faintly heard in the left and right channels, especially when lots was going on. Not so in DSU. The channel separation was so good it might as well have been a discrete mix. Switching over to Neural X, the first thing that I noticed was a ton of comb filtering and an incoherent sound field. With DSU, each channel carried it’s own separate content with no overlap, with Neural X, it almost seemed as if I had a center channel and two mono surround channels, which fed the front, height, and surround. Unhooking the center channel, I confirmed this to be the case. The separation between the front left, top left, and surround left was nearly non existent, almost as if I were using an old Dolby Prologic decoder with extra speakers attached. Changing the configuration from top middle to front height, rear height, and then 5.1.2 to 5.1, 7.1, and even 4.1 produced the same results. I’m not entirely sure why or what’s going on here, and I intend to do some further testing, but at this point, I would strongly recommend against the use of Neural X for 2ch content.
Testing it with music, I started out with classical. Despite the fact it’s recorded in stereo, all of the phase and spatial information is preserved, reflections from the room arrive at the microphones at different times, which gives the decoder lots to work with. DSU flawlessly reproduced the imaging of a large auditorium. Comparing between stereo and 5.1.2, DSU really opened up the sound stage in all way that sounded impressively realistic to what one would hear at a live performance. As with movies, channel separation was exceptional. When switching back to stereo, I instantly felt like the sound stage seemed flat and lifeless when compared with DSU. Not only does it bring out all that missing spatial information, it also brings out subtle nuances that are lost in the stereo mix, similarly to switching between stereo and mono.
With pop and rock music, the results are obviously different, since the way it’s recorded is much different from acoustic recordings such as an orchestra, for example, an electric guitar track is given a stereo image by recording the track twice, and mixing each track into the left and right channels. Studio recordings are generally recorded in an acoustically treated room, so any reverberation is added during the mixing process. Either way, DSU did a fantastic job of upmixing the original stereo recording. I never felt like the decoder added to or distorted the original stereo image, it simply added depth in in a way that sounded completely natural. Generally with stereo, in order to actually perceive a three dimensional sound stage, one must sit directly in the sweet spot, move out of it and the illusion collapses, with DSU, you get the same image you’d get in the sweet spot no matter where you sit. As with classical, I much prefered DSU to stereo. The upmixer seems to do a good job no matter what the content is.
I will note that unlike PLII there is no specific music mode, however, it doesn’t need one. By default, all sounds coming from the center such as vocals are hard mixed into the center. Some folks might not like this, but I personally feel like this is more natural. If one sits in the sweet spot, vocals come from the center anyways. For those who prefer the image be spread out more, most receivers have a center spread option.
With Neural X, I had similar issues as I had with movies, I also noticed a huge loss of bass.
Objective Comparisons: Decoding the Decoders.
As we’ve previously established, DSU and Neural X function a bit differently than their horizontally limited predecessors. Dolby now uses a multiband decoder, vs the broadband PLII, and God knows what DTS is doing to derive the extra channels. Deriving heights is much more complicated than deriving a center and surrounds. It’s simple enough to matrix sounds that are out of phase into the surrounds, but how does one pull vertical information out of a two dimensional mix? That’s what I set out to discover.
To dissect the decoders, I made an educated guess as to what I thought they were doing in order to remap sounds and created multichannel sound files with various phase and channel manipulations based on these guesses. I have included the files used at the bottom of this post.
With Dolby Surround, sounds can be shifted into the height channels by a simple timing delay between a specific set of channels, for example, a 10ms delay between the left and right channels with full band pink noise will push the sound into the top front channels, add that same delay to both the surround and front channels and it gets pushed into the top middle, add it to the the rear channels only and that sound gets pushed into the top rears. Adding a 10ms shift between the front and surround doesn’t seem to stir up the decoder, since it processes channels in pairs separately. Having different levels of timing delays (i.e. 15ms vs 10 ms) slightly changes the overall effect. Splitting a signal into different frequency bands (for example, pink noise high passed at 500hz, with a 10ms l/r delay, and a pure 440hz sine wave, no delay) allows independent steering of sounds with fantastic separation. One thing I did notice with multichannel content is sounds remapped into the the heights do not entirely leave the respective channels, but are spread between the bed channels and height channels, I believe Dolby did this on purpose, since as you will see shortly, the decoder is capable of completely removing sounds from the front channels when remapping them to the surrounds in 2ch mode. This would explain why I find it less discrete sounding than Neural X.
What I was really curious about was how DSU would determine what should go into the surrounds and what should go into the heights with 2ch content. From my listening tests, it does a pretty good job sorting it all out, but how does it do this? With the old Prologic II decoders, sounds could be placed into the surrounds via a completely phase reversal, or a timing delay. Using full spectrum pink noise, placing the left and right channel 180 degrees out of phase, the sound is pushed to the surround channels only. Curious to see how I could get a sound to come from the heights independent from the surrounds, I created a 440hz tone, 180 degrees out of phase, overlayed with 500hz high passed (48dB) pink noise, 180 degrees out of phase with an added 10ms delay between the l/r channels. The 440hz sine wave moved to the surrounds, and the band limited pink noise moved into both the surrounds and the heights in a similar fashion to what happened with multichannel content. Playing back only the delayed pink noise spread the without phase inversion spread the sound equally throughout the 7 channels.
Overall, DSU does equally well with multichannel content and stereo content, and ir’s method for deriving height channels from a 2ch mix seems to work quite well.
Remember Neural X being described by DTS as a “spatial remapping algorithm”? Looks like that’s exactly what it is. Unlike DSU, which upmixes based on variations in the time and frequency domain, Neural X operates entirely in the spatial domain. What does this mean exactly? Neural X decodes height information based on the spatial relationship of sounds in a 5.1/7.1 mix. During my testing, I managed to get pink noise to play only from the top middle channels by equally spreading the pink noise across the front and surround/rear channels, how it handled this was significantly different with a 7.1 mix vs a 5.1 mix, with a 7.1 mix allowing far greater accuracy. With a 7.1 mix, the only way to to get sound to come from the heights is to spread it between the rear surrounds and fronts. Why were the rears chosen instead of surrounds? I have no idea, but I suspect DTS has their reasoning because it seems to work much better subjectively speaking with 7.1 movies vs 5.1. mixing pink noise equally between the front left and rear left channels remapped the sound into the top middle left, obviously the same was true with the right channels. Sounds can be panned by altering the volume between channels, for example, pink noise mixed into the front and rear channels can be remapped into the front heights channels by reducing the volume or the rear channels by 12dB. Using this, I was able to create a full 360 degree circling around the room using pink noise and a 7.1 channel mix.
As with DTS: X encoded material, sounds mixed into the rear channels, played back over 5 bed channels, are mixed as a phantom image between the surrounds.
Neural X offers fantastic channel separation, but unlike DSU, sounds remapped into the heights are entirely removed from their original channels. If we are indeed trying to simulate an object based audio experience from traditional multichannel content, this is a good thing. Phase and timing manipulation has no effect on Neural X when using multichannel tracks, however with stereo content, out of phase information is directed to the surround or rears, and from there directed into the heights as is done with multichannel content. Like DSU, Neural X is capable of steering sounds independently in separate frequency bands.
Earlier on, I mentioned how Neural X appeared to introduce awful comb filtering and cancellation when using 2ch content. Playing back 2ch pink noise, 180 degrees out of phase between the l/r channels, sent the sound into the surrounds, just like DSU did, the difference, however, is unlike DSU, Neural X played the sound back still 180 degrees out of phase, causing cancellation between the surrounds. Both DSU and PLII undo the phase inversion when remapping sounds to the surrounds, Neural X doesn’t for some reason, and I suspect this is why there is so much cancellation and comb filtering when using 2ch content. If an out of phase sound is remapped to the surrounds from the fronts, and that sound is played back still out of phase, it’s going to created destructive interference between the fronts and surrounds. I plan to contact DTS about this, and will update this post once I hear back from them.
So Which One is Better?
Based on the tests above, I’d call Neural X a true “object based” upmixer, since it literally translates sounds from 5.1 and 7.1 mixes into objects with spatial coordinates. I’m definitely blown away by the extreme precision it has, especially with 7.1 channel mixes. While Dolby Surround does a good job at extracting height information, it’s not quite the same as an actual object based experience, nevertheless, both decoders have strengths and weaknesses, and it’s tough to say either one is better as a generalization, both DSU and Neural X excel at certain things, while falling short in others, I can certainly say DSU is the clear winner with 2ch content due to the issues with Neural X described above. With multichannel content, it’s really down to preference. Regardless, both upmixers are significantly more advanced than their predecessors. At some point, I will get around to testing both out with a 7.1.x setup to see which one is better at upmixing 5.1 to 7.1, but for the time being, I’m stuck with 5.1.2, so this will have to do.
Which do you prefer? Dolby Surround or Neural X? Leave a comment below to explain which and why.
Be sure to check out the test files I uploaded on your own systems. If you’ve got a system with more than 5.1.2, I’d love to hear how it performs with a greater number of channels.