The tech stack behind Foursquare and the real application and data infrastructure, utilities, devops, and business tools Foursquare is using. read more
A primary gauge of system health is message queue length. The message queue length of all the processes on a node is constantly monitored and an alert is sent out if they accumulate backlog beyond a preset threshold. If one or more processes falls behind that is alerted on, which gives a pointer to the next bottleneck to attack. read more