Some time in the last few months, video playback performance on my iMac has started to suck. Most videos now periodically pause for 5-8 seconds (video frozen, audio continuing). With many videos, this happens about every 30-60 seconds. With some, it's fine. On a "bad" video, if I skip back to just before the pause-point, it plays fine. If I skip back too far before the skip point, the pause happens at very close to the same spot. Sometimes it pauses a bit longer, and then audio stutters too.
This happens in Quicktime Player, Quicktime Player 7, and iTunes. Many of these are old videos that used to play fine. SD and HD.
Here's an example from today. It doesn't stutter when played full-screen HD on Youtube, but does if you download it. In Quicktime Player 7, this mp4 stutters at 00:02-09, 00:23-26. 00:46-50, 01:05-11, 01:25-32, 01:47-54, etc. Same spots every time if I don't rewind. In Quicktime Player X, it stutters slightly less often and for slightly less long -- 00:16-19, 00:35-36, 00:40-42, 01:19-22, 01:40-44, 02:04-06, etc. -- but again in the same spots every time. iTunes is similar.
Unfortunately, a bunch of things have changed on this machine over the last few months, and I'm not sure exactly when this problem began: I've upgraded from 10.9.1 to 10.9.2; replaced the RAM; replaced the internal drive. So there are several things that could have been the cause. It could be some new stupidity in the Quicktime library, or a disk performance problem, or... well, maybe it can only be those two. But I'm not sure how to test it.
Though the machine generally feels much faster than it did before that RAM upgrade.
- iMac 27" Mid-2010
2.93 GHz Intel Core i7
32 GB 1333 MHz DDR3
ATI Radeon HD 5750 1024 MB
HGST Deskstar 4TB 7200 RPM SATA III 6Gbps, 64MB Cache
Any idea what to test? I don't particularly understand how to interpret the output of iotop or fs_usage.
It could even be your gpu crashing, if you're using hardware acceleration. That could be caused by an age-related failure in gpu/video hardware (electromigration, capacitor failure), or by a power supply that isn't putting out enough power. A diagnostic test would be to configure quicktime to use a software decoder instead.
You can try to eliminate the other possibilities in a similar fashion. The ram is probably the easiest; just put the old ram back in temporarily and see if it still happens.
If that sort of thing doesn't yield any information, try stracing the video player.
How do you configure Quicktime Player to use a software decoder?
I don't have the old RAM any more. Also I haven't seen RAM actually go bad since like, the 90s, so that would be hard to believe.
stracing the video player will tell me exactly one thing, "this program makes a lot of system calls".
Stracing may tell you which syscall is consistently blocking, if any.
Sorry, I've no idea. I usually use mplayer on linux myself.
You're right about the ram being unlikely to go bad, but that's the next easiest hardware problem to eliminate, or would have been.
Out of curiosity, have you testing playback of the same media using something that doesn't rely upon the quicktime underpinnings? Presuming it's h.264 inside that MP4 container, just using VLC should be test enough (http://www.videolan.org/).
If the file works in VLC, mark it down as the latest in a long string of regressions in Quicktime that Apple don't give a toss about.
VLC 2.1.2 stutters too.
But Movist 1.3.5 does not stutter at all.
Is there anything interesting in VLC's output during stutters when you start it with --verbose=2 from a terminal?
I don't have a command-line version of the VLC self-abuse kit.
You don't need one. The VideoLAN wiki says the executable resides under /Applications/VLC.app/Contents/MacOS/VLC and it should work just fine. Launch with -I rc to omit the GUI if gets in the way.
So you're saying something like this happens - you download the thing from youtube and play it in QuickTime player where it stutters. Then you try it in VLC where it stutters. And then in Movist and it doesn't stutter? Sounds more videoish than disky, to me.
What happens when you change the codec binding in Movist from FFmpeg to QuickTime?
There's also a doodad called 'OpenGL Driver Monitor' that you may already have installed, if not it's on Apple's dev site as a download titled something like 'Graphics Tools for Xcode - [your Xcode version]'. It tracks and graphs all sorts of stats, including VRAM usage. You have lots of it but then again, lots of things use it these days, take a look when you see the stutter.
Might be a dead end but it's relatively quick and painless to check, compared to disk abuse.
Also, this is a good tool to test OSX disk subsystems for suitability to video playback:
What results am I looking for? I'm getting green checkboxes on the PAL and NTSC lines and gray Xs elsewhere. Looks like 95-105 MB/s I/O on average, though it varies a bunch.
That means your machine can't handle anything but SD content. That's pretty crap and less than half of the performance you should be getting from that drive. Something isn't right here.
For reference, my Macbook Air w/SSD passes all the tests apart from the 60fps HD tests.
Also demonstrably untrue, since I watch 1080p files with Movist every night and it never, ever stutters.
You're probably not watching (or recording) uncompressed 10-bit captures though. The rest might be explained by FileVault.
If FileVault is at fault then I'd expect to be able to quantify that numerically with something like the BlackMagic thing.
Sure, your throughput numbers are vastly higher than is needed to watch a compressed video, FileVault or no.
I have had some similar (video pauses for a few seconds, audio continues for a while longer) glitches, and they always seem to be accompanied by my external USB HDD spinning up out of sleep. Time Machine is the only thing that regularly accesses that drive, though I haven't ruled out other things.
That could also just mean movist is doing more aggressive read-ahead caches, smoothing over I/O bumps. And/or degrading colour depth or frame rates to achieve the smooth playback. I don't know movist internals so can't say if that's the case or not. It's a fine result in any case.
In my professional experience, the likely culprits for this sort of problem are, in order:
- Apple screwing up quicktime and not telling anyone, again.
- disk I/O issues
- playback software configuration
- video driver settings, in particular vsync to blank when having >1 monitor
- corrupted data
For comparison, this is what a healthy SSD (w/ FileVault enabled) on a late-model MacBook Air can achieve:
In any case sounds like you're happy with movist so why not just use that. All the best.
Well, Movist is good for watching TV and movies, but pretty much only that. It's UI isn't very good for anything else.
That Blackmagic app won't tell you much directly, since all the results are geared towards evaluating the viability of super-high bitrate video formats, like ProRes etc, that are used in post production workflows.
It can rule out IO as the issue, though. It is a standard tool that is generally trusted to incur IO in a way that is characteristic of video reads/writes.
If you are getting ~100 MB/s, that is way way way higher than the bitrate that most 1080p h.264 content needs, which usually tops out at around 2000kb/s (I think my comparison math is right there?). Not sure about what other formats that you might be converting to (webm? flv?), but they would be similar bitrates unless you are intentionally converting to a ridiculous codec, such as a stream of tiffs accompanied by a wav file.
I think it's safe to say that you have ruled out IO, assuming you ran the tool against the same disk that the video files live on.
If you throw a reproducibly-problematic video up somewhere on the innernets, I could take a look to see if I can spot a problem.
I linked to what had been (for hours) a reproducibly-problematic video above, but sadly it has now ceased to be reproducible.
Since you diddled your machine's insides recently, is it possible you accidentally disabled a fan? I ask, because my Mac Mini tells me it wants the vacuum up in its areas by making iTunes stutter for me--video and audio both.
Well, Apple loads custom firmware onto the drives they ship, and without that, the fans go on full-tilt. I am running "SSD Fan Control" to silence the hurricane. I'm hammering my disk right now with "Black Magic Disk Speed Test" and SSDFC says 2330 RPM and 126°F. This confirms that both fans and sensors are attached properly.
I'd thought that wasn't "firmware" so much as "hardware" but being pedantic forced me to discover this plausible theory:
...in which case it's interesting if SMART actually does blow that much to require the conveniently proprietary Apple solution.
(But it'd be 700 items down on the chain of likelihood that aftermarket SMART-polling software would be the problem since it doesn't occur at that interval, so no need to read this as advocating that.)
Wow, that's nuts.
However, my iMac is a "Late 2010", so it still has a cable using the temperature sensor pins -- and I replaced Hitachi with Hitachi, so presumably the cable is correct.
Interesting webpage, but having read it I think there's a simple reason for this design, and it's not "SMART actually blows that bad". (Although it does in fact blow. We're talking about something which came out of a PC industry committee.)
Apple puts fan control loops in the SMC, a microcontroller on every Intel Mac's motherboard. The SMC is completely autonomous, since it's required to control the fans correctly even if OS X crashes, or you boot a different OS, or the CPU executes the HCF instruction. So SMART wasn't an option, since using its data would require putting OS X and the system's CPU into the fan control loop. Instead, they've used various methods of connecting a simple thermistor on the HDD's PCB to the SMC.
I can second temperature-triggered cpu throttling causing video stutter on a mac (though I've only seen it on older, now-considered-underpowered hardware). Sadly I don't know how to check for it, I just notice it when I manage to ovstruct the airflow (using the laptop on the bed).
Y'know, I was possibly blaming the GPU lower down, but this i7 is also a "Turbo Boost" model, isn't it? Which is definitely enabled in OS X (see usual StackOverflow blind groping) and may have the similar mis-deterministic effect of "oh I see you'd benefit from some extra CPU cycles so I've just bumped up the clocks but whoops you still need them and now I'm too hot so I've got to cut you off after you possibly made some assumptions depending on them."
And at that point we're in the territory of "Oh, faster RAM let the CPU sit idle less so now it does more work and hits a thermal limit faster."
At which point... well, fuck, maybe it goes away when the OS X scheduler stops favoring the core under the worst spot on the heatsink, say. But "you" shouldn't have been allowed to count on that performance in the first place... but similar to the nightmare discussed with disk sensors, now how much oportunity does software have to adjust for that dynamic reality without taking overhead that wipes out the advantage?
(Disk fans are a simpler problem, but given the above discovery Apple apparently had to go nuts there to avoid abrupt changes that'd create a noticeable sudden difference in noise to the user.)
I guess the way to debug this is to log some really fine-grained temperature and clock data while watching top to see which cores the process is on (vs. which ones it isn't on, and all compared to Sometime When It's Not Doing It). I'm sure that will be joyful and easy.
[Somehow I can't resist this thread because ... well, yeah, the "How do we ever expect any of this to work?" has been on my mind for years now. Also amazing at how much still-fine-tuned cycle-counting realtime stuff co-exists with these features on modern systems to make them usually_work.]
Turbo Boost and thermal throttling are different functions on Intel CPUs.
Thermal throttling is expected to save the processor from fan failures (or blocked fan intake vents), so it is very aggressive. It forces CPU frequency down to the idle value, often somewhere around 800 MHz. If that's not enough, it can start forcing the execution pipeline to stall on most cycles, reducing the effective frequency close to zero. If even that's not enough, it can shut the system down.
Turbo Boost is active when the CPU is active, but not overheating. The minimum frequencies you'll see while Turbo is running the show are fast enough that there shouldn't be any conceivable scenario where video decoders stutter.
I think you're going a bit overboard worrying about hypothetical software failures in the face of varying performance. Nobody is writing applications like that, because it'd be crazy. These aren't hard realtime systems, so depending on precise execution timing is impossible regardless of hardware predictability. And even with constant clock frequencies, there's random behavior out of the memory hierarchy, branch predictors (AMD just disclosed a neural net based predictor in a recent x86), network, disks, etc.
I'm sort of aware what's up under the hood, but more often if I'm bothering to care I'm being amazed by kernel code and applications seem like big monstrous black boxes with deep buffers trusting someone else's libs to do the right things often enough. Just because that's actually how it is. (And then I sometimes read about game code doing stuff that requires cycle-counting and am again amazed that they found a place where it's still worthwhile to do that.)
There shouldn't be stutter but here we are discussing a rare corner case of sometimes-predictable stutter for some ungodly reason. Saying Turbo has nothing to do with thermal management seems a bit disingenuous (obviously there's also the power/eco/fan_noise aspect, but if the part could run at Turbo all the time without making a special 'Turbo' feature ... Intel still would have inflicted it on the market as a marketing term like VIIV or Vision, okay, point taken).
Linux is very special sometimes, but they've been having constant problems tuning anything to do with P-state transitions quite right. (I noticed with some AMD Socket 939-era hardware with relatively massive transition latency when, around 2012 or 2013, someone finally admitted that the "ondemand" governor was doing some violently wrong things with that hardware; Intel has generally taken less of a hit..)
...hey, after only 4 hours of Googling, actual numbers for current generation processors:
... clearly you can't blame noticeable video effects on the hardware transition latency alone, but stir this all up with the software to decide when to make the dynamic changes and count on the scheduler and application to do exactly the right thing with the realtime thread that, in certain cases, the scheduler will start protecting you from?
...obviously OS/application developers are given a few ways to screw up (generic overview of how many things can change and why, though only vaguely "how often"):
Linux has a special good habit of screwing up:
...and you'd think Apple would have a better chance to get it right by having Intel on speed-dial, but corner cases, and this is already an "obsolete" chip in "end-of-life" hardware; media is kind of "their thing" but is their test suite for this that much better than the one for SSL? (Particularly if it's entirely expected behavior at the 'system' end, and only the decoder library is bugged if mispredicting how 'realtime' it can actually be?)
P.S.: It seems like Apple was recently winning at this using newer Intel on-chip graphics, though whether that had a damn thing to do with power states or was just some other bug... well, yeah.
[I give up on pretending I'm making an argument here, but I'll hit post as a Collection of Interesting Links. In fact I get the impression Intel was planning for this ultra-high-granularity P-state flippery for like a decade, ever since they had to take "beyond drastic" measures with Prescott, but therefore I suspect a certain amount of the "It's 2014 and video playback still isn't a solved problem?" hell in the world is due to assuming it has no impact when it actually gets it right 99.99% of the time. Or maybe 30% if you were using the other common hardware with Linux at certain points, but you'd wish Apple would leverage one more 9 out of that whole joined-at-the-hip relationship.]
I'm not trying to say that Turbo has nothing to do with thermal management, more that temperature isn't its direct concern. Turbo is about power control, and the Collection of Interesting Links you found doesn't appear to have totally contextualized that. So I'll give it a shot.
Intel requires system integrators to design the cooling system to handle the processor's TDP (thermal design power) rating indefinitely, without allowing the chip to exceed its temperature limits. In return, Intel guarantees the processor won't exceed the TDP rating. If both sides do their engineering right, the CPU never gets hot enough for thermal throttling to trip.
Before Turbo, Intel tried to keep its side of the bargain by setting a fixed operating frequency based on power measurements of worst case customer code. This worked well enough at 1-2 cores per chip, but began imposing annoyingly low frequency limits at 4 cores and above, particularly when you consider that most people use only one to two cores most of the time.
Turbo is how Intel dealt with this issue. The processor's PCU measures its own power consumption, looking for opportunities to boost the clock above that pessimistic value without exceeding the TDP rating. IIRC the control loop might react to temperature in corner cases, but the focus is controlling power and letting temperature work itself out.
Also worth noting is that the PCU can physically shut off cores idled by the OS, and it self-limits boost based on how many cores are turned on. Max frequency (e.g. 3.6 GHz in jwz's iMac, where the baseline is 2.93 GHz) is only available with just one CPU core turned on. It doesn't completely shut off turbo with all cores turned on -- there's still scope for turbo since not all code uses the same amount of power -- but the maximum all-cores boost is usually much lower than the single core figure you'll see quoted in specifications.
Getting back to video decode, so long as there's some form of acceleration (and there usually is), it's likely to use much less than a single core's worth of compute power. Unless you're doing something else in the background, this is a recipe for the PCU shutting down all but one core and pegging its frequency at the upper limit.
p.s. I think you might have gotten a mistaken impression of how much the OS is supposed to be involved in the control loop. Intel's PCU is another of those more-or-less autonomous embedded microcontrollers. It can take outside policy directives, but it's designed to make most decisions on its own. Those linux screwups sound like old power management code interacting poorly with the PCU's mostly automatic management. These days, operating systems are supposed to use ACPI to declare intent, and let the hardware handle the rest.
For duck-and-doge's sake - you're trying to argue that TDP has nothing to do with temperature except in corner cases?
How about page 25 of the Sandy Bridge PDF? 12 sensors, used for Notification, throttle and shutdown and "PCU optimization algorithms" which reads to me as "Not melting the chip because people cry when that happens." Yeah, yeah, T-states and equivalent might not be under the exact banner of "Turbo" and the NSA isn't looking at your porn collection under that program, okay - to civilians this is all part of the New World Order of complicated thermal management.
I don't exactly disagree on most points otherwise, so just wanted to call you out on that part. I didn't really cotton to how much was offloaded to the chip itself vs. just living in the driver, because why the fuck would you do it that way - oh, right, because you can't have developers and victims of Linux distributions making warranty claims after failing to run the right code and frying chips because thermals or-fine-we-can-call-it-power, egg-chicken-black.
And that everything-on-one-core maximum-ultra-hyper-derp-Turbo-clock scenario seems like when you're going to meet real thermal throttling if something is very unusually wrong with the thermal situation directly around it (bubble or piece of cruft in the thermal compound, say). And the OS still has say over which cores things run on. Real-world thermal gradients here, although just for a couple AMD chips - Intel was making lots of noise about the point-heating problem back when The Register and The Inquirer were worth reading daily.
(The Linux screwups are just screwups, but for fun let's say they took the 'old' approach as still required on all hardware produced before 2010 or so and failed to notice it was broken maybe because it caused less pain on smarter new chips (or just because no one was looking). The big pessimization on the old S939 hardware was just for assuming frequency jumps were as-free as on Intel and as-necessary at aggressive intervals. It's amazing how much Linux development (still) seems to be done from the perspective of "all the world is my specific Packard Bell laptop and the bazaar will naturally select out all the assumptions that are wrong."
Re: the callout, no, I am not arguing that, and if that's what you got out of it I wrote poorly. I'm trying to describe how Intel's Turbo operates, and so far as I know its only concession to overtemp is to back off a little at temperatures slightly below throttling limits, as a sort of soft landing for systems with slightly underspec coolers so that they don't trip the thermal throttle. Anything worse than slightly underspec is outside what they really want turbo to handle -- their goal in life is to treat TDP as an interface contract between themselves and system integrators.
That said... there is an exception, and it's the temperature sensitive "PCU optimization algorithms" in Sandy Bridge on page 25 of that PDF. I thought about talking about this, but didn't because it felt like I was writing too much of a book already. In SB onwards, so long as the chip is cold, the Turbo system is given a larger power budget and the ability to boost frequency above normal Turbo limits. Documented cooling requirements don't change at all. The idea is to sprint until the heatsink and CPU die warm up, then settle down to sustainable long-distance running.
This is mostly intended to improve laptop battery life. Under light, battery-friendly loads, the CPU's in a neverending cycle of "wake up, handle an interrupt, go back to sleep", with most of its time spent sleeping. If you graph power with respect to time, each wakeup is a narrow, square-edged pulse, and the area under the pulse is the number of joules drawn from the battery. This feature attempts to reduce the area under the curve by narrowing the pulses. Their height goes up a bit, but with care it's possible to actually improve battery life by going faster. As a side bonus you get better interactive responsiveness. Win/win.
These frequencies aren't sustainable, but are available a lot longer than milliseconds. Independent reviewers have found that, starting from a cold heatsink, it usually takes 30 seconds to a minute of continuous above-TDP operation before the PCU is forced to back off to its normal turbo frequency and power limits.
Oh, and yes, localized hotspots are a concern. That's one of several reasons why they don't go nuts with 5+ GHz single-core turbo limits.
P.P.S.: Well, shit, I take back a good degree of suspicion for this particular problem after seeing the 5+ seconds part. Somehow I got it in my head that this was a ~200ms 'skip' here.
Which does sound roughly more like the kind of latency for a full GPU reset that it appeared OS X had some sort of mechanism for when I let cgminer run for a day on very similar hardware. (27" i7 iMac, whatever the first one with Thunderbolt was; using some utility to temporarily crank all the fans up helped but it did not seem happy with high "intensity" in any case).
(Although I think OS X does try to do the same trick some other experimental BSDs use and keep the thread parked on one core for even-that-long to avoid cache thrash, so if there's some sort of excruciatingly unique physical thermal problem and you get really lucky... now it's likelihood #699 on the list.)
Have you checked for system log spam during the times that videos are stuttering? Also, filter on the phrase "I/O error" in Console and see if you find anything like "disk0s2: I/O error". I'm dealing with a failing disk on one of my Macs, been seeing that one a lot recently.
For what it's worth, I youtubedown'd that Liars music video on a different, fully healthy (as far as I know) computer, and it doesn't stutter at all in QuickTime X. 16GB MacBookPro, PCIe SSD, 10.9.mumble.
There's nothing that seems related in system.log.
I've noticed similar video issues myself, I think since updating to 10.9. (I'm also on 10.9.2, on a 17" MacBook Pro with 16GB RAM.) Just to give you an extra data point.
I have also recently noticed problems with video playback on my mac mini in both VLC and interwebs streaming through chrome. I'm getting random pauses or audio cutting out followed by a superspeed audio playback to get caught up.
What does the free versus inactive memory utilization look like?
Awesomely enough, my reproducible test case has stopped stuttering and everything's playing fine right now. However: Swap used: 0 bytes, 4 days uptime. So that suggests that memory pressure isn't the problem, right?
Yes, probably not if swap is 0. Memory management in OS X historically hasn't been viewed the best: http://superuser.com/questions/317215/how-to-disable-mac-os-x-from-using-swap-when-there-still-is-inactive-memory
Only other thing I can think to check that hasn't been covered already is Spotlight doing it's derpy indexing process.
I also can't seem to thread replies today. Fail.
Unnnf. The entire premise of that stackoverflow question and all its answers is confusion. "Inactive memory" is was OSX calls "pages that have been allocated and written to, but have not been touched lately." Inactive pages are the first to swap out. Asking to reclaim inactive memory without touching swap is like asking a kid to pick up his room but without putting anything in the toy chest or any drawers.
The persistent myth that "inactive memory" means "instantly reclaimable cache" is what made Apple change all the terminology for 10.9.
Good lord, the size of the ignorant stinky deuce you just dropped on that stackoverflow question! You are living proof that a little knowledge is a dangerous thing.
No. "Inactive" how OS X describes "allocated pages which haven't been touched lately". The inactive list can contain clean pages which do not need to be swapped out before being reclaimed. Clean/dirty is orthogonal to active/inactive.
If you believe this you have no idea what the fuck you are talking about. And that applies to any operating system with a reasonably well designed VM system, not just OS X.
You might want to study up on Mach VM objects, Mach memory objects, their relationships with processes and pages, their role in file caching, and particularly the VM object cache before strutting around boldly proclaiming that there ain't no such thing as inactive memory which is instantly reclaimable cache.
Is all inactive memory cache? Nope, that is indeed a misconception, but there is at least a grain of truth to it, and it is far less offensive a mistake than some of the shit you're peddling. Please, if you have any shame at all, delete that StackExchange answer.
Thanks for reconfirming the central property of the programmer community: actual questions are often met with either silence or the blind leading the blind, and useful information is often only extracted by first trolling with wrong answers.
I updated the answer to reflect that paging out inactive memory does not necessarily mean writing to swap, depending on things.
You didn't edit your SO spew much, so it seems I didn't quite get through to you. You know how you got super bent out of shape by the anonymous tech writer responsible for that Apple Knowledgebase article, and the SO answers which took it seriously? The writer was referencing a specific, real, entirely non-mythical Mach VM optimization. You claim that a kernel engineer wouldn't say such a thing? I can point you at the relevant source file if you like. (Even though I'm not a kernel engineer!)
The article is at worst guilty of picking just one thing as an example of what "inactive" memory can be, but it's not like an Apple KB article should be expected to be a complete primer on the esoterica of Mach virtual memory.
I assume that 4TB drive is an "Advanced Format" drive.
Is partition alignment an issue on Macs?
Did you partition the hard drive after you purchased it? If so, how?
Partitioned by the OSX installer. And again, if there was a problem with the drive I should be able to show that with numbers. Whatever voodoo you're suggesting would manifest numerically.
I experienced a similar issue on a Windows box. It was caused by the interaction of my cloud backup utility (Carbonite) and Windows Volume Shadowcopy Service. When Carbonite wanted to do a backup it would request a snapshot of my volumes (which were 4tb in total), which issued a freeze, which paused all VSS writers, which in turn made my system freeze for long periods of time (sometimes upwards of a minute). If you've got any sort of automated backup running, I'd try disabling it temporarily and seeing if that helps.
Er, fuhh, rare that I make a typo that posts an entire comment - of course when I was about to say that link isn't realllly relevant and put it near the end of all this...
I have no specific advice or even a specific guess, but just wanted to offer the nugget that AMD promotes dynamic clocking via "PowerTune" (what this marketing word actually means, someone with connections will have to explain) which is probably heavily enabled on a mobile part in a tight design like an iMac.
So you are likely to have probably-software making guesses about how much power hardware is consuming, and rapidly jiggling or throttling clock rates around to meet those limits (in fact I've only met "PowerTune" as a tunable in cgminer for playing with fake money), and mayhaps certain particular points in certain particular videos in certain decoders are a worst-case for the current code vs. last year's code.
[As far as blindly fiddling with the "PowerTune" knob - choosing "-10%" did seem to stop the heat output of my card to stop outrunning its heatsink and fan when pushing hashes as fast as possible all day. With less impact than simply dropping the clocks by trial-and-error - so I believe that's supposed to specify 10% "less watts" or "less heat", not explicitly 10% less performance.]
So combine the nondeterminism of having that enabled with the idea that "maybe a capacitor is aging somewhere" (and interesting that most of the complainers with more serious problems in the above link find temporary relief from asking software to do whatever software thinks increases voltage - maybe setting it to "0.05v above spec" is just bringing reality back up into spec) ... my unfortunately uneducated and therefore useless guess is that it's more likely to be "dumb drivers" than "hardware failure", but both Apple and ATI have long traditions of encouraging hardware upgrades to restore determinism when software fails (and then Apple also sometimes adds 'amazing it works at all' hardware failures - if I remember right the "snow iBook" era predated SMART support in the OS so wouldn't hint that those N-second freezes were everything blocking while I/O retried forever... and by some miracle the troubled spot of disk never interrupted the boot process or any application loading, maybe only a particular point in swap?).
P.S.: What I was trying to say is that intuition (but not documentation - is there any?) from the actual impact on cgminer suggests that the chips support about 4 "profiles" for gross power stepping, but then applies the "PowerTune" cap somewhat independently in terms of "what are the maximum number of transistors we're going to let turn on at once?" or some equivalent. So you've got the gross dynamic "power profile" clocking occurring at such-and-such intervals (above or below 1Hz?) and then mayhaps the "PowerTune" effect (if PowerTune is actually a thing) occurring between them.
So how does any of this stuff ever work deterministically at all? (I sort of have the impression that PowerTune is a way to brag about "we discovered that we had to do some really special dynamic crap to keep modern chips from melting at the speeds people want them to run at", whether in the driver or hidden in card firmware... or it could just be a word for Nothing as some claim and this is all in my head?)
The new drive may have been internally re-mapping some bad sectors, silently. I suspect the other program that was working properly was filling a larger read-ahead buffer, masking the issue.