Notes:
This sounds reasonable until you start to wonder how to measure “working software”. I don't think this can be measured directly – and therefore I doubt its utility as a “primary measure”.
It may be difficult to reach agreement on whether software is working. I have certainly encountered cases where it has been necessary to call in external consultants to give opinions on whether software is working.
More importantly, are there gradations of “working” - can software be “95% working”?
If we allow gradations is it the number of UAT tests that the software passes a measure of the proportion of the software that is “working”. (And what if a test is wrong?) Or is it the number of (or value of) business functions that it supports?