lamont:
That's if everything goes well. If it goes poorly (consider something like the blockage in the LP inflator suddenly clears and the wing begins to auto-inflate) then you could get a mouthful of it... I just don't like the idea of sticking my mouth over a bag of gas that will kill me if I inhale it... call me crazy...
I dunno if I'm stretching here, but I see a really similar kind of mentality at work and in diving...
If everything goes right and its a sunny day you have nothing to worry about. You won't breathe any argon. The filesystem won't be full. Nobody will ever rewrite that startup/shutdown script so that it can hang. You don't need to code timeouts, you don't have to check exit codes.
I just took a 50 line bash script and turned it into a 450 line perl script mostly through adding timeouts and return code checking with a bunch of logging of its progress. I still haven't added any code to add a simple persisted on-disk state machine to the script so that it can do recovery when it gets interrupted and there's a bunch more sanity checking that could be done still. It'll probably be about 1,000 lines when it's done but it'll be considerably more bullet proof and it'll ***** like crazy when anything goes wrong.
And a lot of it may look excessively paranoid. Right now the application shutdown scripts just touch a file (which the app should read if it is operating) then sleep for a few seconds and kill -9 the process if its hung so badly it can't shutdown. That process should never go badly so there's no point wrapping my call to that shutdown script with a timeout -- but I can't guarantee in the future that nobody pushes out a version of the shutdown script with a hang in it. I want to be as future-proof as possible against anything bad happening.
The other approach to writing scripts like this is to code it like it'll always be a sunny day and then fix little issues in it whenever they come up. A couple of days ago they removed an infinite loop they ran into while waiting for connections to bleed off that never did which didn't just fail a single server but hung the entire platform. There's other infinite loops with no timeouts still in the codebase that I haven't removed that can still get triggered, though (in fact they only removed a condition that would cause a single server to hang, but didn't remove the infinite loop which also hung the whole platform when any single server hangs -- its just waiting for a single server to figure out another way to hang again).
I see strong parallels to approaches to technical diving. Some people tend to assume it'll be a sunny day all the time... I like my **** to work flawlessly no matter how cloudy it gets...