In December 2017, significant declines were recorded on the exchange rate of the world’s most popular digital currency—Bitcoin. Investors who wanted to sell their currencies quickly, for fear of further declines, turned to the biggest cryptocurrency stock exchange, known as Coinbase.
At Coinbase, they did not seem stressed. Their service was unavailable for several hours, and investors switched to selling their Bitcoins on other exchanges. Coinbase sustained double damages as a result, a direct loss of commissions due to non-performing transactions, and a significant loss in terms of customer trust. Many customers have since moved away to competing exchanges and will likely continue to use them instead of Coinbase.
Coinbase’s unavailable service at the time received a lot of coverage due to Bitcoin’s popularity. The company isn’t alone in experiencing the harsh backlash of a deactivated service. Companies in various sectors frequently suffer from service disruptions. They may not win over news headlines but can face other unfortunate consequences.
In early November 2017, the well-known corporate messaging service, Slack, suffered from several hours of service interruptions. In such a real-time situation, where a critical service for internal corporate communications is not functioning, customers are quick to look for other solutions. They might even find that there are better solutions for them, and abandon ship for the next provider.
In recent years, more and more digital services and online applications have invested in the rapid development of their products, rather than improving their availability and uptime. Additionally, the uses of advanced cloud services and state-of-the-art monitoring tools have created the impression that availability problems belong to the past. Mistakenly, many organizations assume that at any given moment they can simply throw in more servers to improve service or application availability for users.
In services or paid apps intended for a business audience, service providers are charged for certain service times (based on Service Level Agreements). The smallest disruption in service provision can lead to a loss of revenue or remind customers that it is time to check out competitors. Even in cases of free services or applications, where the company is committed to open access, service disruptions can significantly impair the reliability, consistency, and extent of its use.
Despite the importance of permanent service availability, it is rare to find startups that integrate availability as an integral part of their development and distribution process. Many times, small and even larger companies act with an “it will be okay” type of approach. This approach can easily involve relying on potentially unskilled technical personnel, using improperly configured diagnostics tools, and believing that there will always be someone who will wake up in the middle of the night to fix things.
Experience shows that Murphy’s Law works overtime, especially in the wee hours of the night. When a few-hour malfunction occurs in a service, and your organization’s teams do not wake up on time to address it, the reputational and financial damage to your business will be very difficult to repair.
The first thing a development team should do to improve their internal oversight of service availability is to constantly monitor and diagnose relevant data. Tracking data regarding service availability is typically done by DevOps Engineers or, if present at your organization, by Network Operation Center (NOC) Engineers.
Many companies often display screens with dashboards in various areas of the office, monitoring things like the rate of customers joining; how available services are used; and more. This is with the aim of keeping pace with departments that are not directly involved in development, so that they understand the effects of their business decisions in real-time.
Some screens in small startups can sometime present an interesting statistic in real-time, such as customer revenue. However, such screens interestingly do not always indicate a lack of data when the service is temporarily unavailable. This figure is relatively easy to quantify and diagnose. You would calculate the monthly revenue divided by the number of minutes per month, giving you a quick figure to use in case a service disruption occurs.
Once the development team and DevOps Engineers are presented with a clear, red-painted financial figure, including a minus sign that grows every minute that the service is down, then employees can better internalize the true cost of downtime. This is even before considering the cost of time and manpower allocated to repair a workplace malfunction on scheduled tasks.
The importance of service availability cannot amount to data alone, as successfully keeping services alive also depends on effective and targeted workflow processes inside your startup or company. Despite the agile development methods used by many startups in one form or another, service availability is not always a priority among the items that should be maintained in an agile manner.
As part of the transition to service availability-oriented development, all product and development teams at your startup (e.g., product management, design, development, testing, DevOps, and even technical documentation) should be trained in understanding and advocating for the importance of service availability within the company.
One of the classic examples of such a move could be seen at Microsoft in the early 2000s. At that time, the tech giant’s products suffered from many security issues that significantly damaged their reputation. Bill Gates led a very transformational move at the time, which resulted in company departments thinking about security at every stage of a service. This led to a large improvement in products and services’ security, and the company became rooted in security-oriented development.
Microsoft’s bold move not only improved product and service security, but also reshaped and improved upon the entire development process—creating new ways of founding entire products and services for customers. In the case of service availability-oriented development, this would mean that your entire startup could significantly benefit from the fruits of having committed more to service availability.
Many times, such an internal business transformation can lead to the discovery that there are no systematic development processes in place. From quality testing during development to fault detection during the end stages of a service’s development, developers may not be able to develop a general awareness on the importance of service availability across the entire product development cycle. In many cases, critical data and infrastructure also rest in the hands of a single person (e.g., IT Administrator), creating a problematic, one-sided dependency for the entire organization.
It is important to remember that, as with any work improvement process, one must be wary of so-called over-improvement. Emphasizing service availability alone may lead development teams to becoming overly cautious of updates and add-ons, in an effort to avoid compromising service availability. Another example is the relentless pursuit of DevOps Engineers in improving the relationship between speed and service availability, rather than balancing additional requirements (e.g., budgetary ones). Prior to implementing new processes in favor of stronger service availability, it is useful to establish standards regarding service availability goals, budgetary limits, and so forth.
The pace of technological change, together with users’ widespread demand for high and consistent service availability, are only rising. Heightened awareness within your startup, and the creation of a technological infrastructure and managerial processes which consider today’s service availability requirements, will surely reduce development times and save company costs in the future.