During the sanitary emergency, all of us have seen how the consumption of streaming services has growth reaching numbers never seen until now. From one side, the situation is terrible watching nursers and medical stuff without enough resources to win the battle, although it's the time to pitch in and adapt our infrastructure to a high demand in a short time.
Slowdown of internet is happening generally in all the world, and despite the effect is being mitigated really quickly, this issue expose a few problems we had but never faced before. Also despite the massive usage of streaming services, online games and videoconferences from our homes, when large bandwith are being always focused on offices, the DDOS attacks are congesting the network by more and more clients. What makes me ask what is the real number of botnets right now in the world.
Biggest CDN providers invest millions of dollars every year to expand storage capacity, calc processing and efficiency. And despite the millionary investment, almost all regions have suffered a high traffic collapsing and producing long time responses during this quarantine. In spite of this, we have to agree that all companies are scaling quickly their infrastructure thanks to services like Amazon AWS or technologies like K8.
The covid-19 crisis is driving the biggest internet upgrade in years. https://t.co/ohDPENY1pJ— MIT Technology Review (@techreview) April 17, 2020
Might mention lot of cloud computing technologies here, but the post will get really long and that isn't aligned when title says "Some tips to...". So, I'll give you a brief overview of the most effective and fastest tips to implement IMO.
Most of tips are valid for any kind of project or website, doesn't matter the language. Primarily I discuss topics about infrastructure, how we create or organize the code, and deploying.
Obviously, the more time spend client requests in our servers as IDLE, more time will take to clients get the response. That's why it's worth put some efforts on reduce the amount of resources needed to process every request. Maybe first step would be simplify your infrastructure levels, or find what is getting requests stuck from being resolved. In my case, I started removing unnecesary layers and healthchecks that only caused more traffic when system is working nice. Now my healthchecker talks to application to get the status and only if last request happened more than 1min ago, I do a curl call. Otherwise just would be unnecessary.
A good monitor system and the properly alerts setup, might help you to detect any saturation on your services, possible single point of failures and any overload unexpected. Remember also that nginx resolves much faster a reverse proxy than a typical virtual host to filesystem. Leave some interesting resources for Nginx optimizations, SSL and reverse proxies:
Enable a CDN might suppose a reduce of traffic to your origin server by 30-80% (depending on content type and TTL configuration). This is possible because CDN servers place as intermediary between clients and your origin server. When clients try to resolve a DNS request, our CDN will look for it on their cached assets and only if it's missing arrives to our server, and it keeps cached for the next time.
Simply you must point your DNS servers to the domains given by your CDN provider. Take a look to all available options, usually includes Gzip compression, security options to block a list of ports or IPs, inject headers and control browser-cache. Take specially cares about TTL value, depending of your traffic and type of content you might be interested on higher values or lower lifetime.
Some best providers:
Source: CDNPerf Report (25-04-2020)
Also, a CDN provides you a extra security layer because mitigate most of DDOS attacks, that might impact directly on your server performance and quality of service to your customers. Consider pay a CDN if your traffic or transferred data will reach free plan limits.
In my case, I found an old project working monorepo and it depends of your local installed stack for development. I'm quite lazy so I don't like manage many installations (yeah I have many laptops) for the same thing, becouse many different stuff might fail.
Containers solves this problem for ever and gives you some benefits that you already know (yeah you've heard of them). Try to invest some time on do the initial setup and then all team members will re-use the same configurations promoving a continuous development and removing any work-stops. Remember to use Docker Secrets instead of environment vars, becouse env vars are visible on logs. Once you've done all this stuff, never have been more simple create new replicas of your container in Kubernets, Amazon EKS or Docker Swarm.
Consider this as the lowest priority of all above. Orquestation requires real experts before move to a cloud in production, so despite it's a noticeable performance improvement and let you scale nodes quickly, a bare cluster will achieve same effect.
Suggest to all software engineers to rethink the way we deliver our projects. So far we have been creating multiple services using a single third party service for mailing, authentication, video transcoding or anything you could imagine where we externalize a part of our software. Don't confuse me, I agree those kind of services prevent us of reinvent the wheel, but if many many developments depends of same platform, it's very possible their networks will be saturated and service becomes unavaible. Hence our software should be able to switch to alternative third-party services, and be ready for any catastrophe.
Prepare a strategy, add missing parameters and create alternative accounts for those external services that our project may be affected by outsource issues. By doing this you will be able to give a quick response to your clients and keep high availability beyond what happens in our infrastructure.
To maintain the good health of our systems, it's important to be informed and set alarms for critical or accumulated errors. These logs are very useful to debug and analyze the weaknesses in our infrastructure or software, and allow you find possible bottlenecks that prevent our system manage many requests in a short time.
If your project is built-in like microservices you might be interested into unify the output of all those "black boxes" logs, to a single point in common. Tools like Kafka do really nice this job collecting, processing and storing messages from many different sources. Otherwise you can opt for something more advanced such as Metrictank, Datadog or Prometheus to centralize your logs in one place on cloud. It's also recommendable create load tests to check how far can your application support. Load Impact is my favourite tool because the UI is really simple, support different algorythims to make requests, and has a free plan to getting started (max 50 requests).
Certainly we're living an exceptional situation, so if you already have the performance reports and analysis for future emergencies, you can get benefit of that knowledge to take proper actions on estimated affected areas. And thus avoid scenarios like: