I agree regional differences and a relatively small sample size may very well make achieving a high degree of statistical significance let alone statistical near certainty difficult (impossible?), but it can still be "objectively quantified" (as you list perceived pros/cons/risks/incidents, and also perhaps surveys in terms of near misses).
I personally believe would pretty easy to demonstrate the need to revise certain dive flag regulations, even if it were simply having regulations recognize the rationale to require dive flags is a lot more complicated than other seaming "safety related" regulations like requiring children to wear life jackets on a boat while it's in motion and "it entirely depends" whether there is a compelling safety need for the law to force dive flag use...