At work we have a process that must be followed for you get changes into production. It’s not a foreign concept, just about any respectable company has a process, some are just more strict than others. If measuring processes was done on a scale ranging from none to overboard I’d think we’re somewhere between middle of the road and overboard.
Well today I found that some of our application server logs were not making it to our Splunk servers from a sub environment. After troubleshooting the problem I found that the Splunk agent didn’t have permissions and would have to be bounced. Bouncing a Splunk agent isn’t a problem in of itself except we have some processes running that have to be stopped before hand. Long story short, it’s a pain to do it but thankfully I have a fix to make life good for all parties involved! It’s past the commit for next week’s release but we have a way of getting these changes in. This is a low risk change, comment out a section in an XML file. Enter overboard process:
- Test the fix in the testing environment
- Create RFC (Request for Change)
- Wait for approval…
- Create outage ticket linking it to the approved RFC even though the problem has been there for 9+ months and nobody noticed…
- Create BCR (Baseline Change Request) for approved RFC and outage ticket
- Speak to an approval board to move forward
- Check in change
- Submit build request (configuration has to be built)
- Wait for approval…
- Deploy
Ten steps! That’s 10 steps and about 4 hours of paper work, phone calls, putting cover sheets on TPS reports for something that could have been done in literally 20 minutes. I don’t mind submitting build requests or a single change request form, some paperwork can be a good thing. But I will never understand why I have to create an outage for fixing a problem that I’m already going to have to speak to.

