Table 4, the data would seem to indicate that there isn't any appreciable difference in 6 stops vs 1 stop i.e. probability for event for 6 stops, nitrox is 0.9% vs. 0.7% 1 step nitrox. The 3 stop probability,1.1%. Was the 2nd line for 6 stop 2nd run of data? if so, no inference that multiple stops have any predictive value for DCS.
Sorry for posting on an old thread, but I believe this is a misinterpretation of what those numbers are, and I'd like to clarify based on my understanding. I think the paper's results are interesting, but unfortunately it's lacking some clarity in its description. The paper is not comparing different numbers of decompression stops and their impact on DCS risk. It is comparing different *estimators* for DCS risk.
The idea is roughly this: Given the dive profile of an actual dive, how well can a certain method predict whether the diver is going to get DCS or not?
The 1, 3 and 6 *step* (not stop) estimators are used as a kind of baseline estimator. In fact the 6-step model is used as the null set. From the paper: "[...] but we will employ the 6-step depth set across all gases and breathing systems as the null set". That is why you can see in Table 4 an LLR of 0 for the 6 step estimator.
So what is this estimator, and what is the difference between the 1, 3 and 6 step estimators?
The simplest one is the 1 step estimator. You can compute this estimator yourself really easily. You simply take all the dives in the database, take the number of DCS incidents, and divide the latter by the former. This gives you the parameter p = 0.0077 in Table 4 (i.e. the incidence rate of DCS in the used database was 0.77%).
Now what is the 3-step estimator? In the 3-step estimator, the dives in the database are split up based on their maximum diving depth. Dives between 0 and 299 fsw are put into the first group, dives within 300-499 fsw into the second group, and dives to 500 fsw or more into a third group.
I'm a bit uncertain if these were the exact groups used or not, as the numbers don't quite match up. Table 4 shows DCS incidence rates for the three groups as 0.0054 (I'm actually calculating 0.0053 based on table 3), 0.0080 (actual numbers based on table 3: 0.0081) and 0.0112 respectively. It's that last number that baffles me, as I can't find any set of groups in table 3 that generates this number. I believe this is a mistake in the paper, as the actual incidence rate for the third group would have been 0.23.
Finally the 6-step estimator uses the six diving depth groups shown in table 3, and you can easily see that the DCS incidence rate increases the deeper the dives get.
Now when given a dive profile, we can use these estimators to "predict" the DCS risk for that particular dive.
For example if we plan a dive to 450 fsw, we can derive the estimated DCS risks from those three estimators as follows:
1-step: 0.0077
3-step: 0.0081 (since we are in the second depth group, 300-499 fsw)
6-step: 0.0140 (4/286, since we are in the fourth depth group 400-499 fsw)
Now these predictors are clearly very simple. Obviously if you dive to 450 fsw and then don't do any decompression, you'll almost certainly get DCS. However the step models don't take this into account. The models would give us the same risk estimate for a dive with proper decompression as with no decompression at all, as long as the maximum depth is the same.
This is where the other considered models come into play. The USN, ZHL16, VPM and RGBM models take the actual dive profile of a dive into account, and the paper explains how the model outputs for a given profile can be used to again estimate a DCS risk (with a risk estimator fitted to the database).
The results are really interesting in my opinion. The DCS risk predictions made by the Haldanian models (USN and ZHL16) had a very low correlation with actual DCS incidents. They performed slightly better than the 3-step estimator, but significantly worse than the 6-step estimator. It is no surprise that the maximum dive depth is a good estimator for DCS incidence rates, but I would expect a model that has a lot more information than just the diving depth to perform better (though an argument could be made that the 6-step model might be subject to over-fitting, as training and test data sets are identical and it has 6 free parameters).
The two models that take bubble physics into account (VPM and RGBM) generated better predictions, that were almost as good as the 6-step estimator, though still not quite as good.
A quick note before I continue with some more interpretative notes: I'm not a diver though I've signed up for an open water class, looking forward to it!

). I'm also not a professional or researcher in any scientific discipline directly relevant to diving or medicine (I'm a computer scientist). Hence none of this should be taken as practical diving advise under any circumstances. I might be missing important aspects here.
If I'm not mistaken, this paper suggests that the VPM and RGBM significantly better model the factors that determine whether DCS occurs in a given dive than the considered Haldanian models do. That being said, none of the considered models did particularly well in this task (all being worse than the trivial 6-step estimator).
Now it should be noted that this doesn't necessarily mean that any of those models generate good or bad decompression schedules. Though in practice the Haldanian models need to be adapted ad-hoc with gradient factors in order to give reasonable schedules, as - presumably - they don't inherently account for many of the factors that actually lead to DCS.
However the results are still relevant for the usefulness of the models when generating schedules. In particular, the risk estimators derived from the models can be used to generate decompression schedules by means of optimization algorithms. You basically generate a lot of different schedules, ask the algorithms how big their estimated risk of DCS for each schedule is, and then pick the one for which the model estimated the lowest risk. I'm not sure if this is what computer implementations of the VPM and RGBM do, but they are at least doing something to the same effect.
As far as I can tell, none of this matters if you're diving a schedule that has been dived by a large number of divers before. In that case, a model that accurately predicts DCS risk isn't as relevant, because you can simply pick a schedule based on *actual* DCS incidence rates from divers who have previously dived similar schedules. Assuming the number of divers is large enough, this will always give more accurate predictions than any of these models. In practice you could for example use a Haldanian model and pick the GFs to resemble previous dive schedules for similar depth profiles / gas mixes.
Where the models matter is when attempting a dive that hasn't been dived before (by a sufficient number of divers). Here you need to rely on what a model predicts to derive a schedule with a hopefully low risk of DCS.
PS: One more thing: Hypothetically, if I was reviewing this paper, I would also remark that the RGBM had an unfair advantage in this study, since I believe its model parameters were at least partially derived from a subset of the same database. It might hence also be subject to some over-fitting. While the paper states that "No attempts were made to optimze these parameters for correlation with data.", I believe that an earlier version of the same database was used in the development of RGBM. Though I might be mistaken about this.