Muhib's Blog

The essence of writing clean code: Part I

Ahmed Sadman Muhib — Sat, 14 Oct 2023 19:55:52 GMT

Writing code is a form of art 🎨 Following the basic principles of clean code can help achieve the level of artistry. If you write code for a living 👩💻, then sooner or later clean code will matter a lot to you. The concepts shown here are the gist of the book Clean Code: A Handbook of Agile Software Craftsmanship 📖 by Robert C. Martin

Engineers spend 80% of their time reading code, and the rest 20% of their time actually writing 🤯. In a work environment, it's important that your colleagues can understand (and maintain) your code easily.

Principles of writing clean code

1. Naming 🪧

The most simple requirement, yet the most difficult to achieve 🫠 If naming isn't done properly, the rest of the clean code principles don't matter much. So this section will be a bit longer compared to others. While naming your variables, functions, classes and such keep in mind that the names should be:

Descriptive but to the point
Implies what kind of data being stored

Variables and Properties

Now, look into the following Python code

# BADca = db.Column(db.DateTime, required=False);# GOOD created_at = db.Column(db.DateTime, required=False);# ---------# Suppose, we're creating a user with admin level permissionuser = createAdminUser(...) # BAD, not enough contextuserWithAdminPermission = createAdminUser(...) # OKAY, long variable names are less preferredadmin_user = createAdminUser(...) # GOODsuper_user = createAdminUser(...) # GOOD

Very simple yet strong example. Abbreviating variables or shortening them only creates confusion. Remember, when you're writing the code, only you have the full context. Nevertheless, you should write code in such a way that people without context still understand the code easily 🤝

Now let's look at another example

# Example 1# Storing raw user data as Python dictionaryuser_data = {    'name': 'Muhib',    'age': 27,    'email': 'test@email.com'}# Example 2# Creating a SQLAlchmey (ORM) User DB objectuser_data = new User(name, age, email); # BAD (Arguable)user = new User(name, age, email) # GOODdb_user = new User(name, age, email) # GOOD

Here, user_data can mean anything, it can refer to a Python dictionary or a Database (db) object. Our goal is to be more specific 🎯. According to SQLAlchemy convention, a simple variable named user should refer to a database object. To make it even more specific, we can name it db_user.

Although variables and properties are usually nouns, sometimes we should include adjectives as well to further clarify the context 😇. This is mostly true for boolean values:

# -- BAD --@dataclassclass UserDetails:    name: str    login: bool    premiumSubscription: bool# -- GOOD --@dataclassclass UserDetails:    name: str    isLoggedIn: bool    hasPremiumSubscription: bool

In this code, we're storing the login state of the user and if the user is subscribed to a premium package.

Functions and Methods

The same principles apply to functions and methods, except you should incorporate verbs as well. For example, login(), createUser() , database.insert() are some good names.

Don't use names like user(), email() as they sound like properties . Prefer getUser(), getEmail() instead. Remember to be more specific. If you're creating a user, use createUser( instead of create() .

Avoid using generic names. processTransaction() or processUserRecord() is far better than processData()

Make sure that you're consistent with naming patterns. If you use fetchUser in your code, stick with the fetch* prefix throughout the code. Don't mix fetchUser and getProducts as they can create confusion. Use fetchUser and fetchProducts to be consistent.

Classes

Class names, by convention, should always be nouns. If one of your classes handle creating new user, then UserFactory (noun) is better than CreateUser (verb).

# User creation classclass CreateUser: # BADclass BuildUser: # BADclass UserFactory: # GOODclass User: # OKAY, not bad, not good (arguable)# Process website transactionsclass Transaction: # BAD, we're not creating transaction, we're processing itclass HandleTransaction: # BAD, verbclass ProcessTransaction: # BAD, verbclass TransactionHandler: # GOODclass TransactionProcessor: # GOOD

Casing

Name	Example	Used in
Camel Case	fetchProduct, getUser	JavaScript
Snake Case	fetch_product, get_user	Python
Pascal Case	FetchProduct, GetUser	C#

These are the most common casing conventions. We should respect the cases while writing code for community acceptance. Consult the community guidelines for the language you use.

If you have been attentive for the past few minutes, you might have noticed that I violated this rule somewhere above the code examples. Find it out! 🔎

Enough of naming, let's move forward to the next one

2. Code Formatting 💻

A beginner's mistake 🧑🦱. They know about formatting practices but are not willing enough to follow them. There's nothing fancy here, so I am linking a good article which explains formatting: Clean Code - Formatting. If someone is interested, have a look!

3. Writing better functions 🔨

Keep it short

Functions should be relatively short. I'm going to keep this simple, so not giving any specific line count. But, functions should be easy to read and easy to understand. Consider the following example in Python, which handles uploading images to a file server.

def upload_image(image_data):    image_fields = image_data.subdict(['title', 'owner_id'])    image = Image(image_fields)    db.session.add(image)    db.session.commit()    version_fields = image_data.subdict(['title', 'mime_type', 'blob'])    image_version = ImageVersion(version_fields)    db.session.add(image_version)    db.session.commit()    tags = generate_tags(image_version)    image_version.update_tags(tags)    webhook_url = "https://example.com/webhook"    payload = {        "asset_id": image.id,        "version_id": image_version.id,        "tags": tags    }    response = requests.post(webhook_url, json=payload)    if response.status_code != 200:        raise WebhookError('Notify failed')

If you look into it for a good amount of time 😖, the code is not difficult to understand at all. But our goal is to reduce the thinking time and avoid unnecessary loads on our brains 🧠. So, here's what we're gonna do, we're going to refactor the above code 💡:

# After refactordef upload_image(image_data):    image = create_asset(image_data, Image)    image_version = create_version(image_data)    try:        notify_webhook(image, image_version)    except WebhookError:        logger.warn('Failed to notify')def create_asset(asset_fields, model):    fields = asset_fields.subdict(['title', 'owner_id'])    asset = model(fields)    db.session.add(asset)    db.session.commit()    return assetdef create_version(version_data):    fields = version_data.subdict(['title', 'mime_type', 'blob'])    version = Version(fields)    db.session.add(version)    db.session.commit()    tags = generate_tags(version)    version.update_tags(tags)    return versiondef generate_tags():    passdef notify_webhook(asset, version):    webhook_url = "https://example.com/webhook"    payload = {        "asset_id": asset.id,        "version_id": version.id,        "tags": version.tags    }    response = requests.post(webhook_url, json=payload)    if response.status_code != 200:        raise WebhookError('Notify failed')

We basically extracted the concepts into different functions and tried to make them generic. After refactoring, just look into the upload_image method. It's so concise and easy to understand 🥳. Yes, the line of code might have increased a bit. But now, not only the code is easy to understand, but many functions are reusable as well 🐼. If we add a new method upload_video, we should be able to reuse and keep the flow simple. Furthermore, another engineer can look into the code and choose to delve deeper only if required, previously it was not possible, he had to read the full code to get the gist. There are more complex forms of refactoring, but those are out of scope for this article.

Make it read like an instruction manual

There's another characteristic of the refactored code. Considering upload_image as the entry point, the code reads like a step-by-step instruction manual. That's how the functions are ordered one after another. We can use the TO pattern to read the code and evaluate ourselves.

TO upload_image, we need to call create_asset , create_version and notify_webhook (sequentially)
TO create_version, we need to call generate_tags
DONE

If you look into the function ordering, you can see the functions come one after another: upload_image -> create_asset -> create_version -> generate_tags -> notify_webhook Think of it as depth-first-search but for reading functions 📥

But the big question is

When should you extract a group of code to a separate function?

In the last example, we just extracted different concepts to different functions. So it was simple. But in many cases, it might get more complicated 🚩. The next article in this series will discuss this in detail. That's all for today! Get your feet wet and start practising clean coding today 🫡 Thank you! 🙏

Why you should not use JWT for authentication

Ahmed Sadman Muhib — Tue, 14 Feb 2023 18:54:43 GMT

JWT (JSON Web Token) is a well-known and simple method of authenticating a user. Almost all tutorials you did on the backend, specially API building tutorials probably told you to use JWT. I also thought once that JWT is the modern and better authentication method. Of course, it was a lack of my knowledge. This article tries to shed some light on this and show you some other alternatives of JWT, which can be better suited depending on your needs

What is JWT?

JWT (JSON Web Token) is a stateless token containing user information, signed using public-private cryptography. As such, an intruder cannot modify the token information, because that would also change the signature.

A JWT token looks like this

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

This example is taken from jwt.io. You will find the decoded version if you visit the page. It contains sub (user identification), name and iat (issuing time). Instead of iat, exp (expiry time) is also used.

This image can help you understand the breakdown

Advantages of JWT

Stateless: It is stateless. So no data needs to be persisted to authenticate any user

Self-contained data: You don't need any other source of truth to verify your user. Claims allow you to add extra data params, like if the user is admin.

Disadvantages of JWT

Client-controlled logout

With JWT, the biggest problem is there are no reliable ways to log out users. The logout is fully controlled by the client, the server side can do nothing about it. It can just expect the client will forget about the token, that's it. This is dangerous from a security perspective.

You can, of course, use short-lived tokens and use refresh tokens to fetch new JWT tokens. Let's say, your token expiration time is 2 mins. Now, you have to handle the complexity of using refresh tokens and generating new tokens every 2 minutes. This will be a big overhead in your front end. It will also cost your servers if your application grows large. Above all, you still don't have full control if you want to unauthenticate a user.

Blacklisting user sessions

Maybe some logins got marked as unauthorized by the user, maybe the login was suspicious, or the device is lost or not used anymore. In that case, you would want to blacklist that token. To do this, you would actually have to store those blacklisted tokens in a persistent database and check through them each time a user needs authorized access. This can be an overhead. Besides, storing blacklisted tokens is equivalent to maintaining state on the server, which somewhat defeats the purpose of JWT.

Manually adding it to requests

You have to add the JWT in each of your request headers manually or have some config in place to do that for you.

Storage security

Many devs use localStorage to store session tokens, which can be easily exploited through JavaScript XSS (Cross-site scripting) attacks. You can, however, store the token in secure cookies. But that would solve only one problem

So, what is the solution?

The solution: Good old session cookies

Session cookies are underrated. They are old but gold.

This is how it works: when a user logs in with the correct credentials, the server creates a random bearer token representing the session and sends the session back as a cookie.

user_id	session_token	expiry
2	47a0479900504cb3ab4a1f626d174d2d	1516239022
2	0050479900504cb3ab4a1f626d171f62	1516239822
5	0504cb39900504cb3ab4a1f626d171f6	1556239822

The user id 2 is logged in with two different devices, thus two sessions. The session_token, which is random and unpredictable, will be sent as cookies.

Does managing it seem complex? Not really. Popular web frameworks have some abstraction in place to handle cookie-based sessions. In this article, we will see how we can use Flask to handle session cookies.

Advantages of session cookies

Server-controlled logout

Whenever a user needs to log out, the server can just delete the session entry, and that's it. The user gets logged out immediately, totally controlled by the server. This is a more reliable solution compared to JWT.

Blacklisting

This is also easy. Why keep a list of blacklisted sessions when you can just delete them? If a user marks his device as lost, you can delete all the sessions of that user, forcing him to log in again with his credentials.

Auto-attachment

With how cookies work in general, you don't have to add the cookie with each of the request headers. Once the web browser receives a cookie, it will send the cookie automatically with each of the subsequent requests. These cookies are secure and cannot be accessed or modified using JavaScript.

Disadvantages of session cookies

Increased database read/write

Sessions are persisted in DB. This might result in a slightly increased load on scaling. But in real-world systems, Redis is used to store user sessions, which solves this issue.

CSRF Attack

If not taken preventive measures, session cookies are vulnerable to Cross-site request forgery. To solve this issue in a dynamic website, CSRF tokens can be used. As for REST APIs, CORS (Cross-origin resource sharing) can be configured properly to prevent unwanted requests.

Implementing a REST-like API with Flask and session cookies

I will demonstrate a simple example with Flask to use session cookies. We will use Flask-Login, which will manage the session cookies for us.

First, we will create a User model and some mock database methods

# user.pyfrom flask_login import UserMixinclass User(UserMixin):  def __init__(self, id: int, name: str, email: str, password: str):    self.id = id    self.name = name    self.email = email    self.password = password # in a real application, password would be hashed  def get_id(self):    # required by flask_login    return str(self.id)# Mock DBu1 = User(1, 'TestUser', 'test@email.com', 'weakpassword')u2 = User(2, 'SecondUser', 'rest@email.com', 'pass2')users = [u1, u2]def get_user(id):  for user in users:    if user.id == id:      return user  return None

The UserMixin model implements some methods that are required by Flask-Login. This get_user will act as a fake database method.

Next, we need to instantiate LoginManager() . We will use a separate file for this:

# login_manager.pyfrom flask_login import LoginManagerfrom user import get_userlogin_manager = LoginManager()@login_manager.user_loaderdef load_user(user_id):  # in a real application, you would fetch the user from DB  _id = int(user_id)  return get_user(_id)@login_manager.unauthorized_handlerdef unauthorized():  return {'message': 'Unauthorized'}, 401

The load_user method will be used by Flask-Login to reload a user. How the user is returned is totally up to you. In our case, we will be returning the user from our mock DB. In a real application, you would fetch the user using your ORM (SQLAlchemy, in most cases).

Finally, the main.py file will be used to include the routes and start the server.

# main.pyfrom flask import Flask, requestfrom flask_login import login_user, logout_user, login_required, current_userfrom user import get_userfrom login_manager import login_managerapp = Flask(__name__)app.config['SECRET_KEY'] = 'some-nice-secret-key'login_manager.init_app(app)@app.route('/heartbeat', methods=['GET'])def heartbeat():  return {'status': 'OK'}@app.route('/login', methods=['POST'])def login():  data = request.get_json()  user = get_user(data['user_id'])  if not user:    return {'message': 'User not found'}, 400  if user.email == data['email'] and user.password == data['password']:    login_user(user)  else:    return {'message': 'Invalid credentials'}, 400  return {'message': 'User logged in'}@app.route('/logout', methods=['POST'])@login_requireddef logout():  logout_user()  return {'message': 'Logged out'}@app.route('/profile/', methods=['GET'])@login_requireddef profile(user_id):  user = get_user(int(user_id))  if not user or current_user.id != user.id:    return login_manager.unauthorized()  return {'name': user.name, 'email': user.email}app.run(host='0.0.0.0', port=81)

The login and logout methods are self-explanatory. Notice that the logout method doesn't take any kind of user-identifying param. Just calling logout_user will automatically fetch the session cookie attached to the request, and log the user out.

To access certain profile data, you have to be logged in as that user.

That's it! The full code can be accessed and run from this Repl (Click Show Files to see the code).

Where to use JWT?

Don't get me wrong, JWT is a solid solution, but it works best in certain scenarios, not all. In a gateway backend (backend directly communicating with web client), JWT complicates things. Besides server implementation, you have to maintain a whole lot of logic in the front end. On the other hand, cookies make everything simpler.

Although JWT might not always be the solution for your web API authentications, it definitely has some use of its own. It's a great way to gain temporary access to protected 3rd party resources. Or in another terms, JWT is best suited when communicating API to API.

Suppose, you use Spotify. Now, for the sake of this example, let's say a song is stored in AWS S3 and the access URL looks like this

https://s3.spotify.com/music/example-music.mp3?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Notice the token param with the query. The token param has a JWT, which verifies that you're an authenticated user of Spotify and can access the music file for the next X minutes. Usually, file access is not directly related to your Web API in this case, so, a stateless token can be super useful to give privileged access to files.

The following diagram can help you understand this

In the above example, Gateway API uses session cookies, and the related services are under a private network. But, S3 Access is separate from your business logic and thus it shouldn't know anything about your application. As such, a JWT token is the best approach to give controlled access to your S3 files (like the music files in Spotify in our example).

Thanks for reading this far. I hope this article clarified the concepts. If you liked it, please share it with your network. If you have any feedback or questions, drop them in the comments. Thanks again!

Docker: A conceptual overview

Ahmed Sadman Muhib — Sat, 03 Sep 2022 09:46:52 GMT

In modern DevOps, Docker is considered the most popular set of services for application containerization. In this article, we will NOT be covering a Docker tutorial, because a lot of good tutorials are already available on the internet. Instead, we will dive into the theory of how Docker works behind the scenes.

"Docker" is not a single tool, but rather a set of tools/components:

Docker Client (user CLI to talk with other Docker components)
Docker Engine (the core engine with all the magic behind virtualization)
Docker Compose (run multiple containers)
Volumes (persistent disk space)
Docker Hub (place to store built images) etc.

What is an Image?

A single file with all the program dependencies and configuration included, which can be used to create containers that can be run anywhere. The image works as the blueprint to create containers.

What is a Container?

An instance of an image which runs a program. This container can be run in any computer system without any extra hassle

Docker is used to create images and also run containers derived from those images. Images can be stored in Container Registries such as Docker Hub. You can think of Docker Hub as GitHub, but for Docker Images. Please note, you can also create your private container registries and keep your image there.

How Docker work behind the scenes?

Before jumping into how Docker works, we need to understand how a process communicates with the hardware

Processes cannot talk to the hardware directly. They have to communicate with the Kernel, which works as the bridge between hardware and software

The process uses a system call to talk with the Kernel. The system call is the API that can be used by external programs to communicate with the Kernel.

Now, let's say you have two processes, Spotify and VSCode. For the sake of this example, let's assume Spotify requires Node v12, while VSCode requires Node v16. Let's also assume that we cannot install multiple Node versions on the same machine. So in such a scenario, one of the applications will work and the other won't. This is where Docker comes into play. Look at the below image (if the image is not clear, please right-click on the image and open the image in a new tab)

Here, Docker creates isolated segments in your Hard Disk and can redirect to the proper disk segment as required by different processes. In this way, Docker helps avoid dependency and configuration issues that could've been otherwise conflicting. In this sense, we say that a containerized application can work on other computers too without any issue.

But the burning question is, how does Docker achieve this isolation?

Namespace and Control Groups

To achieve this, Docker uses some low-level kernel features, those are Namespace and Control Groups (cgroups)

Namespacing: Isolating resources per process or a group of processes. For example, isolating the separate versions of Node.js in the hard disk from our above example.

Control Groups: Limiting the number of resources used per process. This can be used to dictate how much resource (processing power, memory etc) a containerized process can use

The Docker Engine or more specifically Docker Daemon (a process named dockerd) is in charge of managing all these things.

But....
There's a catch...

Namespace and Control Groups are ONLY supported by Linux kernel.

Then, you might ask,

How Docker runs in Windows/macOS?
In these cases, Docker Desktop creates a Linux virtual machine in your said operating system and runs Docker on top of that.

Please note, that the said features in the Linux kernel are not new and have existed for quite a long time. In fact, the concept of containerization is not new either.

Now, from our previous diagram, let's mark out the containers:

In the above image, there are two containers, both marked in orange outline. Shows only the hard disk as the resource.

And here is a zoomed view of a single container with each resource shown:

One extremely important point to note, the kernel here is independent and expands beyond the container boundary. A single kernel is responsible for controlling multiple containers. We will get back to this later.

From images to containers

Let's discuss in-depth on images, and how images are actually used behind the scenes to create containers.

Images keep a copy (or in more formal terms snapshot) of the file system that is required by a specific program, along with configuration and startup command.

When you try to build a container, the following happens:

Look for the container image in your local system
If it's not found locally, reach out to Docker Hub to download the image
Create a placeholder container
Replace the placeholder container hard disk portion with the file system snapshot of the image
Run the startup command (and other necessary configs, if any) found in the image and start the container

Hypervisor vs Docker

Remember, a few blocks back I said that the kernel is independent and a single kernel is responsible to handle all container communications with the hard disk? Well, that single independent kernel is the host kernel. So if you're running Linux operating system, the main Linux kernel itself will handle all the containers.

On the other hand, a Hypervisor is an old virtualization technology (sometimes dependent on hardware support), where a whole copy of the operating system is made to run a single application. Unlike Docker, this introduces a lot of overhead, including increased boot-time. You might have used VMWare or VirtualBox with your operating system, those use Hypervisors to achieve virtualization.

In the case of Docker, the host kernel manages everything so a complete copy of the OS is not needed. As you can already guess, the winner is Docker here. Nobody uses Hypervisor anymore unless there are specific reasons.

Conclusion

Docker is a very powerful tool that has revolutionized how we deploy applications on the web. As a Software Engineer/Developer, it is an absolute must that you have the basic working knowledge of Docker. I hope you have enjoyed this article, if you have any questions, please use the comment section. Also, don't forget to follow and subscribe to my blog. Have a great day!

Introduction to Microservices architecture with Tinder

Ahmed Sadman Muhib — Fri, 07 Jan 2022 12:47:55 GMT

Howdy folks? It's been a long time since I wrote an article. Today, I'm restarting with this which will aid beginners to learn what Microservice architecture is, how it works and its pros and cons. After that, we will design a mock Tinder application using microservices.

Microservices is an architectural design pattern that is followed by many tech giants to create highly scalable and reliable systems. Netflix is the first company that popularized the use of microservices architecture. Before learning about Microservices, we need to understand what was there before Microservice came in.

Big Old Monoliths

To create a basic web application, we need a) server and b) database . These pieces together make a single service. If your application grows in size and gains popularity, it will be not so basic after a few months. Soon you will be going through thousands (if not millions) of lines of code and countless files and folders all working together to provide the functionality to the customers. It might be hosted in version control like Github and all of your teammates will be working on that single repository. This is a monolith architecture. All kinds of functionality for your application like user auth, user profile and core functionalities reside in a single place.

Pros

Easy to detect issues throughout the system
Infrastructure maintenance is relatively simple and less expensive
Easy to onboard new team members as the codebase is in a single place
Doesn't require a sophisticated engineering team to handle the system

Cons

Difficult to scale properly as each and every functionality is tightly coupled
Adding new features and code maintenance can get overwhelming very easily
No fault isolation. If one part of your application goes down, the whole application goes down.
Difficult to distribute development across teams
Usually limited to a single type of tech stack. So if you're working with the MERN stack, you will have to deliver the whole functionality in that single stack.
Build and deploy process can be very slow.

Let's split them up, onto Microservices

When we want to adopt Microservices, we need to do separation of concerns. Instead of creating a giant single service (monoliths), we separate them based on their functionality and create small services (microservices). There are some certain criterias like data duplication, scope, independence etc while doing this which will make it an actual microservice, but we can skip those part for this article. All of those services might have their own databases and own servers. For example, if you're creating a social media application like Facebook, then it might have the following services:

Authentication Service: Responsible for user authentication and authorization
File storage service: Responsible for serving and storing media files
Friends service: Store friend relations
Feed Service: Generate your personal feeds based on relevancy

...and many more. Please note, if you used a monolith architecture, all of these services would be treated as a single one.

Pros

Each unit is decoupled so upgrading and maintaining a single part is much easier.
Scaling is more flexible and easier. Each microservice can be scaled independently. You can quickly point out which service is load-intensive and scale just that.
Improved fault isolation. So if your 'file storage' service goes down your 'authentication' and 'friends' service would still work. This is possible because each service has its own set of machines (servers, databases etc).
Easy to distribute development to different teams. One team can be responsible for one microservice
You can use different tech stacks for different microservices. So the service where machine-learning is involved, you can use Python. For the service which is IO heavy, you can use something like Node.js and so on. You get the idea.
It can help to board new members easily assuming one team looks after a single service.
Build and deploy will be extremely quick as you're working on a small service.

Cons

The system architecture or infrastructure gets extremely complex
Needs a very capable and much bigger engineering team
Expensive
Detecting issues or debugging gets relatively difficult as different services are communicating over the network. More tools are needed for logging and monitoring
Inability to properly draw boundaries among different microservices would cause more trouble than solving them.

WARNING: The concepts here are very simplified for new learners, so this article should not be treated as a full-fledged guide to microservices. In reality, separating concerns is not as easy as it seems. They are called 'micro' for a reason (not 'mini'). Don't confuse 'microservices' with 'service-oriented architecture'. Yes, they are different!

Monoliths or Microservices, which one is better?
Depends. Unless you're working on an application that is complex in nature and you need to scale to millions of active users, you can safely avoid microservices. Even then, a monolith might be a better option depending on your team and capability. Also, it is not possible to get a full practical hands-on experience on Microservices unless you're working in a professional setting.

Designing Tinder using Microservice architecture

Now comes the fun part, we will design a mock version of the popular dating application Tinder. Please note, this is not an actual design of Tinder and is for educational purposes only.

Before jumping to the microservice-based architecture, let's see how it would look if it's built using the monolith architecture:

Let's jump into the microservices design part. For Tinder, I've decided to create the following microservices:

Gateway service: This is the service where the users will connect from the public internet. It would be responsible to process some initial data and directing the request to the relevant services as required. Typically, every kind of microservice-based application should have a gateway service like this.
Auth Service: Used to authenticate/authorize user requests
Storage Service: Will be used to storing and serve all kinds of media files from the user, such as images, videos etc
Messaging Service: A socket-based messaging server that will be used for one-on-one chatting/messaging purposes among the matched users
Profile Service: Here we will be storing users and their matchings related information

This is enough for our learning. In reality, considering that Tinder uses microservices, it is definitely not only 5 services. It might be 50 or 100 or 150, we may never know. Just as an example, Netflix uses 300+ microservices to keep the platform running smoothly.

Using this architecture, let's see how this boils down into a simple diagram

Each circle represents a running container, usually, it's a docker container. A container might run on the same or a different machine. Some containers together form a cluster. The cluster decides how to balance the loads and communicate among different containers. Things like this are handled with a container orchestration tool, such as Kubernetes.

Let's talk about scaling a little bit. Let's say, after some logging and monitoring the engineers at Tinder see that the most demanding service is the messaging service (obviously!), so they can just scale the messaging service like this:

We just horizontally scaled the messaging containers. Please note, some earlier portions of the image is cut off to fit smaller screens. The rest of the architecture is the same as the previous images.

Handling the communication chaos

Now, it is very likely that when some event occurs, multiple services have to know about it. User registration is such a scenario. When the user registration happens, we need to do the following:

Store the user-related information in the Auth service and send processed data in Profile and Messaging service
Store and process the user media (like profile pictures, gallery images) in the Storage service
Store the user details process data in the Profile service
Store and process session related information in the messaging service.

We can make the communication direct among different services. But, this is not the perfect solution. It will easily get tangled and complex as the communication is happening in many places. The left side of the figure below depicts this situation. Imagine what would happen when you have to communicate across 100 services. Also, from an engineering perspective, we're tightly coupling the services as soon as we put inter-communication logic inside them. The goal should be the exact opposite, which is decoupling. By decoupling, we can ensure maximum scalability.

To solve this issue, microservices use a common bus channel for communication, this is a simple message queue where there are publishers (or producers) and subscribers (or consumers). Any service can subscribe and listen to a message queue. One service can be both consumer and producer at the same time. The right side of the figure depicts this.

This is much more simple. No service is directly communicating with each other. There is a common channel where everyone is publishing the events with necessary data. As a result, one service doesn't have to know about another service, which ensures decoupling. Those services subscribed will be informed of all the data in the channel.

Technologies used for microservices

For deployment, we would need a combination of Docker, Kubernetes, Helm, Ansible, Jenkins and a few other tools. If you go to the microservice route, you obviously need a very capable DevOps team.

For inter-service communication, as I already described earlier - RabbitMQ, Amazon SQS, Kinesis etc are used.

Logging is an important part. To implement proper logging and monitoring we need a combination of Elasticsearch, Logstash, Kibana, Newrelic, Sentry etc.

That's all I know. There are definitely more tools that I'm simply not aware of.

Conclusion

As a beginner step, this article should be enough. If you want to explore more in-depth knowledge on microservices, you can watch this amazing guide on Microservices from the one and only Netflix.. That's all for today. Please leave your feedback or questions in the comments below. Have a good day.

Introducing ScreenView: The missing social platform for movie/tv show lovers

Ahmed Sadman Muhib — Fri, 27 Aug 2021 14:54:26 GMT

Do you love to watch movies? Do you love to watch tv and web series? Then, ScreenView is for you. When the Auth0 hackathon was announced, I along with my friend decided to build a platform for what we love most which led to ScreenView. Before getting started with ScreenView, let's talk about Auth0 first.

What is Auth0?

Auth0 is an authentication-as-a-service platform. It makes user authentication extremely simple. You don't have to worry about security or managing user/roles because Auth0 can handle all that for you. This project uses Auth0 as the authentication service provider.

What is ScreenView?

ScreenView is a social platform for movie/tv series lovers where the users can share their activities and stay updated with their friends. In short, we can say it's like Facebook/Twitter but targeted for movie/tv series lovers.

The problem

Traditional social platforms like Facebook/Twitter are not enough to share your media activities. Yes, you can share what you're watching with your friends, but those posts will get lost under a pile of other generalized posts. Features like movie reviews, recommendations and discovery are not available on such platforms although these are essential. It's tough to find people with similar watching interests because those platforms aren't simply built for a specific audience.

The solution

ScreenView. It lets all your dream come true. Everything you need for such a social platform is already there. This is built by movie lovers, for movie lovers. You can do the following:

Discover upcoming, new and trending movies to watch
Follow people with similar interests
Stay up to date with what your friends are watching
Share your opinion in the comments
Create and keep track of your own Watchlist
Share your Watchlist with your friends
Write and share movie reviews with your friends
And a lot more interesting features planned for the future

The building blocks of ScreenView

Type	Name
Language	Javascript/Node.js
Framework (Backend API)	Express
Framework (Frontend)	React
Database	MongoDB
ODM	Mongoose
CSS Framework	Tailwind CSS
Auth Provider	Auth0
Movie API Provider	TMDB

Challenges we faced

This project was definitely technically challenging. There were many challenges that we faced. The most noteworthy one was determining the business logic to show user feeds. I had to find a way to effectively fetch and show user feeds in record time. After much hard work, I completed the job with success and full satisfaction. For those who are interested, you can look into the MongoDB query in the code repo (given at the end of the article).

Quick sneak-peak

There are many features but to keep the article short I'm going to show you the main ones quickly.

In the below GIF, you can see the user posting a review of a movie. Remember, you can also post watch status, where you share what you're watching with your friends.

In the next GIF, you see typical user activities like browsing through feeds, commenting, looking into the movie discovery section and user connections

And in the final GIF, you'll see how easy it is to add items to your watchlist from anywhere. Interested in a movie that your friend is watching, just add it to the Watchlist right from the feed. Or interested in a new movie you discovered, just add it.

Try it out!

We're live: https://screenview.netlify.app

Code repository: https://github.com/ahmedsadman/screenview

Please note, what you see in your feed depends on the people you follow. This is a new site so it might take some time to gain traction. Until then, you can invite your friends to this site to grow your feed.

Hope you liked this work. Please don't forget to share it with your friends and family who loves screen entertainment. Thank you!

How to build a strong profile for the tech job (engineering) market

Ahmed Sadman Muhib — Fri, 30 Jul 2021 13:18:15 GMT

Graduated students often wonder what is stopping them from being called in an interview. Let's say you're a very good programmer. But the recruiters don't know that. To prove your skills and your worth, you have to get called for the interview first. This is where a strong profile comes into play. By profile, I mean your LinkedIn, GitHub and your overall brand as a programmer. In this article, we will discuss different tips on how to improve your profile, these are the same ways I followed myself.

The process can be divided into mainly two categories - building your brand and networking.

Building your brand

A strong GitHub profile will speak for itself

Do projects

Try to build some unique projects, it doesn't have to be for academic purposes only. In fact, doing projects on your own is the best approach to learn and grow. All those projects should be recorded in GitHub, using proper commits and workflow. What I mean is, don't do the whole project at once and publish it under a single commit. Your commits should reflect the development path.

Avoid hotel management or library management type of stuff. Those are very common, easy to implement and complete code is available on the internet. It is tough to build unique projects, in that case, do some common projects but make it more challenging. Want to make a to-do app? Fine. Instead of making a simple to-do app, make it collaborative. With just a simple collaboration feature you just introduced a challenging aspect of your project. You should try to incorporate challenging stuff in your projects so that recruiters know that you can create innovative solutions. The projects should have a visual component that anyone can use. For example, think of the recruiter as your customer. He should be able to use your project and benefit from it without knowing the technical details. So, if you're making a web application, keep a live version running online for demo purposes.

Try to avoid frontend-only projects

If you work in full-stack, your job field will broaden drastically. And when trying to land your first job, you should be more open about roles. So, try to incorporate the backend. If you think of yourself as a mobile application developer, you could try to create the same API on your own that you initially did with Firebase.

Preach your projects

Don't just do projects and relax. Spread it. Use Facebook, LinkedIn and Reddit. Try to gather Stars for your GitHub projects. Some Reddit pages where you can showcase your projects are r/webdev , r/python and r/programming. In your post, explain what the project does, what kind of problem it solves and most importantly, what kind of challenges you faced and how did you solve it. It's even better if you can attach a demo video of your project.

Get involved in open-source contributions (optional)

This will drastically boost your profile. You can read this comprehensive guide on how to start contributing to open-source projects. With such contributions, you will learn how to understand a large codebase and work in a team-like environment. Of course, noteworthy contributions should be mentioned in your resume.

Please note, getting started with open-source contributions is difficult. If you don't like it, then don't stress over it.

Build the homepage

GitHub now offers an amazing way to introduce yourself. You just have to create a repository with your GitHub username, like this. This is the best place to showcase your work. You can find some inspirations in Awesome GitHub Profile Pages repo. Add your intro here. Pour your full creativity into this.

Most importantly, don't forget to pin your top projects on the GitHub front page.

If you don't have a tech blog, start one. Write about the stuff you learn day to day. Blogging will help you reinforce your learning. You can write articles for beginners like you, or advanced articles if you feel confident enough. When other people see your blog, they will understand how you think and how you solve problems. But be careful not to publish wrong information. To start your blog, you can use Hashnode or Medium whichever you prefer.

Alternatively, you can start your own YouTube channel if you feel like it. Even better, you can do both. I know about one of my co-workers who got called by Amazon just because she had an awesome YouTube channel teaching computer science.

Whatever you do, preach it. Use the platforms described earlier. People should know about your blog.

Internships and part-time jobs

As soon as you can, get involved with internships and part-time jobs during your academic life. This will boost your profile without any doubt. The best way to get into internships or part-time jobs is via networking. Keep a good connection with your university seniors and related online groups and platforms.

A simple and minimalistic resume

Create a simple resume, don't add any icons or fancy designs. To create a resume, I would recommend either Novoresume or a Latex-based resume. You can find Latex resume templates in Overleaf. The resume should be a single page, either double column (equal weight) or a single column. Your photo is not necessary. Your resume should contain the following sections in order (assuming you're a fresh candidate in the job market)

A small introduction about yourself, as concise as possible. Describe what kind of role you're interested in and how you can add value to a company
Education
Skills (Technical, specifying soft skills is not important)
Experience (include any part-time, full-time, remote or freelance jobs experience). Nobody cares if you were the president of your university's Computer Society or Debating Society. These are not experiences)
Projects (A small description of each project along with what tools/tech stack were used to make that project). Also, add the GitHub link to your project. Don't add the demo link here. Your GitHub's readme page should contain the demo link
Achievements (Any contests or hackathons you participated in and made an impact)

Also, at the top of your resume, don't forget to give your LinkedIn and GitHub profile links. Be careful of your grammar and spelling.

When writing about your experience, follow the action-outcome rule. So instead of writing this:

Implemented feature X in the application

Write this:

Implemented feature X in the application which helped the company retain 2X more customers

Grow your network

Participate in hackathons

Whenever you get an opportunity to get into a hackathon, do it. Regardless of whether you're prepared for it or not. Use it as an opportunity to connect with other developers and important tech people. This will help you grow your network and your recognition.

Hackathons will also teach you how to work in teams and come up with innovative solutions within a strict time constraint. It will test your patience and discipline. You can select some of your friends with whom you will consistently participate in these hackathons. Don't forget to mention any noteworthy events in the Achievement section of your resume

LinkedIn says it all

Your recruiter will first look into LinkedIn. Keep your information updated. At a minimum, your profile should contain a photo, a cover photo, a small introduction, experience, honors and awards and languages.

For each company experience, try to describe what you did using the action-outcome rule described earlier. I see many people make the mistake of putting social and club activities in their LinkedIn experience section. That is a mistake and it puts a question mark in your professionalism. LinkedIn has a separate section called 'Volunteering Experience' where you can put those things. Remember, it's okay for a fresh graduate to not have any solid experience.

As you know, you can set a title with your LinkedIn profile. Instead of writing Software Engineer, you should write Aspiring Software Engineer. Similarly, use Aspiring Data Scientist instead of Data Scientist. In this way, when you sit for the interview the interviewers will go easy on you because you didn't actually claim to be an engineer who has zero experience. It further clarifies that you're not boasting or over-confident.

Build valuable connections

Please don't treat LinkedIn as Facebook. It is not the place to add your Facebook friends (although you can and that's totally fine). You should focus on building connections with people who have the same job interest and who are already working in those job sectors that interest you. Don't miss the chance to connect with HRs and recruiters of bigger companies.

If you have some target companies (which you should) that you would like to work for, get connected with the people who work there.

Don't hesitate to reach out to strangers

The primary goal of LinkedIn is to connect you with other professionals, so don't hesitate to initiate first conversations.

In fact, when you are going into the job market. Do the following:

Connect with the people from the target companies
Endorse them for a few skills
Ask them in the message for referrals by introducing who you're and showcasing your skills.

At first, it might seem a bit lame to endorse people you don't know. But you have to understand how endorsements work. That person is already working in a good company and other people have already endorsed him for his skills. You can select the option "Heard from other people about his skills" when endorsing.

You will see that more than 50% of people will either reject or ignore your referral request. But that shouldn't stop you from continuing.

You can follow this message format to connect with people by leaving a note:

Hello, I am X. I am really impressed with your profile. I'm interested in a similar job sector/your company so I would like to connect with you.

You can follow this message format to ask for a referral:

Hello, I am X. I have been following your company for a long time and I'm also impressed with your profile. I see there are some job openings right now. Here is my GitHub profile (put your GitHub) and here are some of my noteworthy projects I would like you to check out (put some of your project demo links along with GitHub code). I would be very grateful if you can refer me for the X job position. Thanks in advance. Looking forward to hearing from you.

Ask for recommendations

LinkedIn has a section for recommendations. Having some recommendations will attract more recruiters to your LinkedIn profile. Try to get recommendations from industry professionals. These people can be your manager during your time of internship/part-time, CEO/CTO or even coworkers. If possible, try to get recommendations from a high-ranking professional. It's better to have only one recommendation from a highly qualified professional rather than five normal recommendations. You should target to have at least three recommendations in total. Although not much important, you should also ask people for skill endorsements.

Coursera Certifications

This is a debatable topic. Although gathering Coursera certifications have become a trend, it can add little to no value to landing a job. So, I would suggest enriching your skillset and doing projects, because that's what will help you in the long run. After that, if you get enough time, feel free to get certifications and show off in your profile.

The above-discussed things are general guidelines and it doesn't mean you have to follow all of them. For example, you can do fine without starting your own blog. The same goes for open-source contributions. Everybody doesn't have to follow the same path to become successful. Some people might get a job just via networking. Some might need to do both networking and branding. It varies from person to person where luck is an important factor. But ideally, you should try to tackle as much as possible.

Preparing for interviews

Now that you know how to make your profile attractive, time to prepare for interviews. Different companies have different interview procedures:

Some companies will assess only your data structure and algorithms knowledge
Some companies will assess your technical competency, critical thinking and teamwork capabilities through long discussions about your projects and past experiences. Also giving you tough projects under an extremely strict time constraint.
Some companies will do a combination of 1 and 2

If you know about your target company's interview procedure, you can prepare accordingly. But it's always a good idea to prepare for a global scenario. You already did project works. Be prepared to answer questions about the projects you put in your resume.

Now it's time to sharpen your basic computer science knowledge.

To sharpen your knowledge on data structure and algorithms, allot 6-8 months before attempting the first interview of your life. You will do only DSA (data structures and algorithm) problems in this timeline. You should use Leetcode for interview preparation. Try to do around 100 easy and 100 medium questions. Then continue solving more of your choice. Although how many questions and what type of question you want to solve is purely subjective. Don't stress over it if you're not good at it. Most companies will ask easy questions, even some companies will ignore it altogether. With the preparation of 6-8 months, you can also attempt FAANG company interviews. There's a myth in South Asian countries that you have to be a competitive programmer to get into FAANG companies, which is not true. Getting into a FAANG company from a foreign third-world country is always tough, because in that case, the company's expectation bar is very high, no matter what you do. For example, if you're applying to Google from Bangladesh, their expectation bar will be very high just to get called for an interview. On the other hand, if you're a US citizen, the expectation bar will be standard and you will get called fairly easily. Maybe you're worthy of FAANG company, but to prove that you have to get called into the interview first. Here, a strong profile comes into play again, how you got the strong profile (either competitive programming or projects/open-source contributions or job experience) doesn't matter.

That's it for today folks. Don't forget to leave your feedback and questions in the comments below. If you want more articles like this, subscribe to/follow my blog. Thank you. Have a nice day!

Understanding password hashing and salting for enhanced security

Ahmed Sadman Muhib — Fri, 23 Jul 2021 14:11:18 GMT

Passwords are the first and most important layer of security of any application. As a developer, it's your responsibility to ensure the highest possible security to your users. In this article, we will discuss password hashing which ensures the best possible security against a data breach

What is password hashing?

Saving passwords as plain text is the worst. If an attacker compromises your database, he can see all the passwords of your users. This actually goes against privacy laws. Firstly, as the developer, you should never know the user's password. Secondly, a user might use common passwords across different accounts and such data leaks might be devastating for an individual. So, how do you protect passwords?

You use password hashing. Hashing is a cryptographic way to secure users passwords. In this way, no one can find out the actual password even if he directly looks into the database. Even the application owners or developers wouldn't be able to tell the user's password.

In this procedure, the password is passed into a specific hash function, which then outputs a jumbled string. Let's say, my password is helloworld, then:

// SHA256hash('helloworld') = '936a185caaa266bb9cbe981e9e05cb78cd732b0b3280eb944412bb6f8f8f07af'

Here, the algorithm used for hashing is called SHA256. Given the same string, it is guaranteed that a hash function will always generate the same output. But hashing is a one-way algorithm. It means once a string is hashed, you can never get back to the original string using the hashed output.

What is the difference between encryption and hashing?
Encryption requires a secret key to encrypt with. Whoever has the secret key, can encrypt or decrypt as required. So, if you encrypt a string and you know the key, you can get back to the original version using that key. But hashing is one way. Meaning, there is no way to go back to the original string.

Whenever a new user registers with a service, his password is hashed and stored in their database. The database user table might look like this:

id	name	password
1	alice	ea71c25a7a602246b4c39824b855678894a96f43bb9b71319c39700a1e045222
2	bob	3caf14423f92dbac6f80589aa72f7be572d7b87562f9fd32ed0c4427b680b598
3	john	cc6db669c9856670ab5faf03a76b4885d146e62a1fd18ab7693a525368e7d654

Now that you have stored the hashed password, how do you compare the passwords during authentication?

As I said earlier, hashing is a one-way algorithm. So, there is no way to get back to the original password. The authentication is done as below:

User provides his plain password during login
The plain password is hashed and the new hash is compared with the existing hash in the database
If the hash matches, the user is authenticated

Security considerations for password hashing

Unfortunately, such simple hashing is not enough. An attacker might use brute force attacks using a rainbow table or hash table. These two tables are different but for simplicity, we're not going to discuss the difference. A rainbow table or hash table is basically an original password - hashed password mapping using the most common password phrases. Hash tables are huge containing millions (if not billions) of common passwords and can size up to 100GB. Rainbow tables take less space but lookup is slow. For the rest of the article, we will use the terms interchangeably. A hash table might look like this:

input	hash
hello	2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
unlockme	06f088e12170d07e80fced4405bb4d7c3ec0153d1e3c136be0f58502ef97b364
password1234	a28d722529366b0e1ff5d75c7ce5676b1a574a982d6b0a2b32050c565139dc8b

We know, for a given string and the same hashing algorithm, it will always produce the same output. The hashing algorithms used for passwords are built to be slow for the attacker's inconvenience. But in a rainbow table, a hash is pre-computed (the attacker prepared the table earlier). The attacker just has to compare the hashes from the database and do a reverse lookup to find the password. Because of modern CPU/GPU power and distributed computing, an attacker can crack your password in record time in this way! That is why it is very important to use a strong password. Regardless, you have to solve this problem because you can't expect every user to use a very strong password.

When a database is compromised, the attacker immediately dumps the database, which basically means saving a copy in your local disk. The attacks then run on the saved database to find out original passwords.

Oh, there is another problem. Let's say Bob and Alice use the same password unlockme. In that case, the hash would be the same for both user's passwords. The attacker will instantly know that the two people are using the same password, which can help the attacker to find patterns in the hashing and further accelerate the attack. For example, attackers will easily figure out there is no salting (described below) or your website uses default passwords for new accounts or you're using a weak hashing algorithm. You definitely don't want these to happen.

To overcome these above-mentioned issues, we salt the hashes.

Salting your hashes

Salting is basically adding random strings to the passwords so that two passwords never match. Salts can be appended or prepended to the original password. For our previous example, Alice and Bob had the same password unlockme. If we salt this it would look something like this:

Alice's password (salted with append): unlockmeea22$sas

Bob's password (salted with append): unlockme3xx#$/a5

The character set in bold is the salt. In a real-world scenario, salt is generated using a cryptographically secure algorithm. The said term refers to such systems which are resistant to cryptanalysis.

Proving a system to be cryptographically secure requires a huge amount of time, research, extensive testing and community engagement. Thus, it is always recommended to use a proven and standard cryptographic system. Don't try to roll out your own.

Such algorithms guarantee that salt will be unique and unpredictable. Now, although the password is the same, Bob and Alice's password would have different hash

hash('unlockmeea22$sas') = '1471dc5d222f43483aee85051691f45c7bf135e55d0b53184ffaf4cc4d30905f'hash('unlockme3xx#$/a5') = '5983db3704436d24a6acccb87405cb002dfd2f04b22b451247d541b1dbca295a'

The salt should be unique for each and every password. The application/backend has to know which salt was used to hash a specific password. Otherwise, the application will never be able to produce the same output. This is why the salt has to be stored along with the hashed password. Let's see how the user table would look with salting in place. For this example, the salt is stored with the password using dot notation:

id	name	password
1	alice	fE07$/.ea71c25a7a602246b4c39824b855678894a96f43bb9b71319c39700a1e045222
2	bob	Xfk7aa.3caf14423f92dbac6f80589aa72f7be572d7b87562f9fd32ed0c4427b680b598
3	john	Q//$cc.cc6db669c9856670ab5faf03a76b4885d146e62a1fd18ab7693a525368e7d654

When Alice wants to log in with her correct password unlockme (suppose), she will provide her password in the login form. The server will find Alice in the user table and see that the salt is fE07$/. So the salted string will be unlockmefE07$/. This string will be hashed and then matched with the existing database hash ea71c25a7a602246b4c39824b855678894a96f43bb9b71319c39700a1e045222. If they both match, Alice is authenticated. If it doesn't, it means Alice gave the wrong password.

At this point, there is no way for the attacker to identify duplicate passwords.

But how does salting prevent rainbow table/hash table attacks?

As we saw earlier, hashing didn't prevent the attacker from cracking common/weak passwords. But salting solves the problem for us. Unlike traditional hashing algorithms like SHA256 or MD5, password hashing algorithms (like bcrypt or blowfish) are built to be very slow. There is a very specific reason for it.

Previously, the attacker pre-computed the hash. The initial computation took a lot of time, but after the table was prepared the attacker could use it indefinitely.

But when salting is introduced, each and every hash is unique. Because every salt is unique and unpredictable. Now, the attacker can't use any pre-computed hash table. For each password, the attacker has to create a rainbow table using that password's salt. This is a very tedious and time-consuming process.

In short, when the passwords were unsalted, the attacker:

Pre-computed a hash table containing the most common password phrases.
Whenever he compromised a database with unsalted hashes, he just had to compare the hashes with the hash table and do a reverse lookup.

Now, with the salting in place:

Attacker compromises a database and sees the passwords are hashed and salted
If the attacker wants to crack Alice's password, he has to use the salt fE07$/ and create a rainbow table with the common password phrases.
For all the other users, the attacker has to create a new rainbow table with their salt and hope that the password is cracked. Do remember that rainbow tables can contain billions of common phrases and creating them is extremely expensive both from in time and space perspective.

There's more to it. We said several times, hashing is slow for the attacker's inconvenience. Let's see exactly how slow it is.

One of the most widely used library for password hashing is bcrypt. Here is an example of password hashing using bcrypt with Node.js:

// app.jsconst bcrypt = require("bcrypt");const saltRounds = 15;const plainTextPassword1 = "veryweakpassword";bcrypt  .genSalt(saltRounds)  .then(salt => {    console.log(`Salt: ${salt}`);    return bcrypt.hash(plainTextPassword1, salt);  })  .then(hash => {    console.log(`Hash: ${hash}`);    // Store hash in your password DB.  })  .catch(err => console.error(err.message));

Here, the variable saltRounds is important. It is the cost or work factor for the salting algorithm. The higher the cost, the more iterations will be done to generate the salt, thus significantly slowing down the hashing process. Actually, the time required for salting grows exponentially with the cost.

In a Core i7, 16 GB RAM system salting with 15 rounds requires around 20 seconds to complete, whereas 20 rounds require 66 seconds to complete. In general applications, the cost/round is reduced to not cause any significant delay when registering new users. Even a 4-5 seconds delay is more than enough. Just imagine the attacker has to wait 5 seconds to generate each hash, where there are billions of phrases and salting is in place to force the attacker to generate a new rainbow table for each password.

Handling data breach

If your database is breached, you should immediately reset all user's passwords and inform them via email. Occasionally, a company can choose to inform their users to change passwords rather than resetting them from their side.

With enough computation power, the attacker might be able to reveal few weak passwords although salting and hashing slowed them down drastically. In such breaches, it is not possible to know exactly which passwords were compromised. Thus, we should consider all passwords as compromised and inform the users accordingly.

Protecting yourself from a data breach

You're at risk only if you're using a weak password, a password that is found in the common password list. So, always use a strong password.
Use a password manager like BitWarden .
When registering for new websites, generate a long (around 20-25 characters) password using BitWarden. With a strong password, the chance of password compromise is zero. k8FQxVP&t28jB&!J!iCi is an example of strong password which I generated using BitWarden while writing this article.
Don't use the same password on multiple websites
Change the password of important or transactional websites every 1-2 months.
Additionally, use HaveIBeenPwned to find out if your passwords were ever compromised, if so, change all your existing passwords.

Thank you for reading. Don't forget to leave your feedback and questions in the comments below. Also, if you like my articles then please follow/subscribe to the blog.

Synchronous vs asynchronous programming and their use cases

Ahmed Sadman Muhib — Fri, 09 Jul 2021 08:45:27 GMT

Synchronous and asynchronous are two different types of programming implementations that confuse a lot of new developers. This knowledge is very essential to create performant and scalable systems.

In short, a synchronous program blocks further operation till the current task execution has been completed. On the other hand, an asynchronous program allows to start multiple tasks at once, then progress and finish in overlapping time. Here the term overlapping is important. In the rest of this article, we will try to understand these concepts in detail and how they can be used to create efficient programs. To make you understand the differences between synchronous (parallel) and asynchronous (concurrent) programming, I will use a real-world restaurant example.

Concurrent and parallel restaurant orders

Concurrent/Asynchronous restaurants

Suppose, on a nice weekend you go to hang out with your friends in a restaurant. You see there's only one counter. You go to the counter and place your order.

The cashier forwards your order to the kitchen and gives you a token. When your order is prepared, the token number will be shown on the large display. So you get back to the table. While you're waiting for your order, you're having a good gossip with your friends. Because that's what people generally do, right?

When you're called to take your order, you pause your conversation, go to the counter to get the food and then get back and start eating. The conversation still goes on. At some point, you guys are done and leave the restaurant.

Now, think of yourself as a computer program

So, you're waiting in line to place the order. But the wait is not noticeable as the line is moving very fast. The reason is the cashier is only taking the orders, but not instantly preparing the items. (Wait for a program/functions/job to get placed in the system)

When it's finally your turn, you process the menu in your head, place your order and pay. (Program gives parameters and execution logic to the system)

Now you have to wait. But even when you're waiting, you've switched your attention to your gossiping. So, you're not totally idle and still doing something productive while waiting. (Don't wait for the current task to complete and pick something else to work on)

At one point, your order is prepared. You go to the counter, get your food, return to your table and start enjoying the meal. (When the initial job's processing is complete, pick up where the program left off with that task and continue).

So, we can say the story has three tasks. We assume that all the tasks are asynchronous:

Get the food
Gossiping with friends
Eating the food

When you placed the order on the counter, you started task 1. When it was being processed, you were focusing on task 2. So the task status is:

Task 1 = In ProgressTask 2 = In ProgressTask 3 = Waiting for execution (dependent on task 1)

When your order is complete, you go to the counter to fetch the items. Then you return to your table and enjoy the meal. Task 1 is complete and task 3 starts

Task 1 = CompleteTask 2 = In Progress (50% complete)Task 3 = In Progress

Please note, task 2 is halfway complete as you started doing it much earlier. You're eating your food and also talking with your friends. Doing two tasks at once? So what? Remember, all the tasks are asynchronous. Besides, tasks 2 and 3 are not dependent on each other. Task 2 and 3 is progressing in overlapping time. For better understanding, think of it this way:

When you're eating, you're not talking. But your friends are talking so the conversation is still going on.
When you're not eating, you're talking.

You can think of your talking as a separate async process. After some talk, you're waiting for feedback from your friends. At that waiting moment, you're eating your food. This goes on till both task 2 and 3 is complete.

That's it! That's how asynchronous programming works. Of course, we will jump into technical details in the later portions of this article.

For now, just note that all the above things happened in a single-core processor with a single-threaded program. We will get more into this later.

Parallel/Synchronous restaurants

Now let's imagine the restaurant has 4 counters to take orders. There are long lines in all the 4 queues and they are working parallelly. The catch is, the cook and the cashier is the same person. Meaning, the same person takes the order and prepares the item for you.

You choose the first queue, when your time comes after waiting, you place the order. Now, you have to wait in line. You can't move because someone else might take your position. The cashier/cook comes back with your order after some time. While you're waiting, other people are also waiting just like you to get their turn to place an order and get their food.

In this scenario, you're synchronized with the cashier/cook. You have to wait and be there at the exact moment that the cashier/cook finishes your order and gives them to you, or otherwise, someone else might take them. Then your cashier/cook finally comes back with your order, after a long time waiting there in front of the counter.

After getting the food, you get back to the table to eat. As you had to wait in the queue, you didn't get any time to have a good conversation with your friends. You just eat the food and leave the restaurant.

Now, imagine you're the computer program

The restaurant has 4 cashiers, which in the computer world would translate into 4 processor cores. For such, at least 4 threads are needed. You will be able to relate the rest of the story with the computer world easily so I'm not going to break it down further.

For the rest of the article, I would use the word 'synchronous' and 'sync' (short form) interchangeably. The same goes for asynchronous/async.

Is asynchronous programming better than synchronous programming?

You saw from the story, for async programming you needed only 1 processor and a single thread. Whereas, sync programming, required 4 processors with 4 threads. Besides, you didn't get any time to gossip with your friends. It seems that synchronous programming is resource-heavy and also doesn't let you do much work. Hence, the burning question, is synchronous programming bad?

No. This is a common misconception.

The more precise answer is, "it depends". Using asynchronous programming where it should not be used would result in an extremely poor performing application. Let's jump into the technical details and everything will be clear as the blue sky.

Synchronous vs Asynchronous: Technical differences

The most popular asynchronous programming language is Node.js. On the other hand, Java, C++, Python all languages are synchronous by default. Newer versions of Python have asynchronous programming support, but it's still at an early stage and not mature at all. Such capability of doing both synchronous and asynchronous programming might make Python way more popular in future than it is now.

Asynchronous program is always single-threaded. The architecture and implementation of an asynchronous system require it to be single-threaded. Maybe we will discuss that in a separate article in future (If you're curious, you can look into JavaScript Event Loop). So, anything you write in Node.js will be single-threaded and will run under a single processor core.

If you don't have a clear idea about threading, you can read my article on Python multithreading and multiprocessing. Although it discusses Python, it explains the concept of threading independently.

Async: The good side

Synchronous programs can be single-threaded or multi-threaded, depending on the developer's specification. A Python program (synchronous) can have a web process having multiple threads. Each thread can handle one web request at once and no more.

But due to the nature of asynchronous programming that we discussed, it can handle hundreds of requests by just using a single thread. For this reason, such programs are very lightweight.

So if it was a Python web application, you would need 100 threads to handle 100 different user requests at once. Creating and dropping threads is resource-intensive. You might be able to handle the same 100 users with a single thread if it's written in Node.js, which is really lightweight from a resource usage perspective.

Async: The bad side

Wow! So good! Why don't we use async programming everywhere then? Unfortunately, everything good also has a significant bad side.

Async is only the best when you're doing I/O heavy operation (network request, database read etc). But for CPU intensive operations, asynchronous programs will be a nightmare. Because CPU operations are not the same as I/O. During mathematical operation, the CPU (processor) will be blocked, which in turn will block the only single thread that the program has. As a result, all the other user requests (assuming it's a web app) will be blocked till the current mathematical calculation is done, resulting in very poor performance. In such cases, you have to use something like Python or Java which is synchronous.

Besides, error handling for asynchronous calls might get tricky.

Is async faster than sync in general?

No. Well, technically it might be but that doesn't create any noticeable difference in most cases. The main advantage of asynchronous programming is it is very lightweight from the resource usage perspective. Both memory and processor usage. So with async programming, you can do a lot more with a lot less resources.

Use cases

Synchronous programming

It's best for creating applications that use the CPU a lot. For example, training ML models, doing heavy mathematics etc. That doesn't necessarily mean that it can't be used for I/O operations. In fact, many I/O applications are created using synchronous programming languages like Java, C++ and Python. The only difference is, if you used an asynchronous program in those cases then it would use less resource

Asynchronous programming

There are many use cases. It ultimately boils down to I/O operation. But I'm still breaking it down for your better understanding

I/O intensive web applications: You already know it's best for handling web requests, database requests etc. If your server has 4 processor cores, you can spawn 4 different processes with one thread each, effectively increasing system throughput up to 4x times.
Hybrid web application: In most cases, you might need both CPU and I/O intensive tasks. In that case, you can use a combination of both by creating separate services. Take Uber as an example. Uber might have a separate RouteCalculator service for calculating the best route. This is CPU intensive task so this can be written in Python. On the other hand, for fetching user data or creating and updating users, Uber might have another UserService which is written in Node.js. The final resulting application is the union of both services where they communicate with each other effectively. The most popular web apps you see is written in this way using microservices (or similar) architectures.
Causing side-effects: For complex applications, there are a lot of secondary tasks that are not directly related to the end-user. For example, when a user registers on Facebook, it will notify the other services, send and aggregate event logs, generate friend suggestions for the future etc. As these are not directly related to the immediate registration, Facebook can reply with a success response while queuing those tasks asynchronously for later execution.
ML applications: Machine learning involves mathematical calculations. If your application works with ML, you can issue an asynchronous request to run ML jobs (written in Python maybe) while you finish up other tasks in your main application.
Performance improvements: Using asynchronous operations might drastically improve application performance, or at least, make it look like a better performer. Separating async and sync operations is a crucial task for best performance. For example, when you move folders in Google Drive, the folders are immediately moved from the UI side, making it seem faster. But the actual folder move operation (backend) is done asynchronously which would take slightly more time.

That's it for today. Feel free to leave any feedback or questions in the comments below.

Introduction to system scalability and reliability

Ahmed Sadman Muhib — Sat, 19 Jun 2021 15:02:54 GMT

Scalability and reliability are measures of how well your application can be served to end-users. If you have a system that can serve millions of users without noticeable downtime, then we can say the system is highly scalable and reliable.

When you're working with a product that has a large user base, a lot of things change. Google has to process at least 100K requests per second. Without a robust scalable architecture, it would have been absolutely impossible. Scalability and reliability are probably the major factors for which we need Software Engineers in the first place.

In this article, we will go through a conceptual overview of software scalability and reliability. To understand this article, you have to know the basics of web development and cloud. It's an engineer's job to make the product scalable and robust. But please note, these topics are very diverse and large books can be written on these things. How the whole system is laid out is called the architecture of the system, which addresses scalability/performance and reliability. Besides regular Software Engineers, companies will have separate Staff Engineers, Software Architects or Site Reliability Engineers who will solve architectural problems from a much higher level.

The topic where scalability and software architecture is discussed is known as System Design. It is a very different skill compared to your general problem solving or coding skills. You might think that your job performance will be evaluated depending on how much good code you write and how good a problem solver you are. It's true, but only for the first 2-3 years. For senior roles, you will be judged based on your system architectural skills. Generally, a company expects that you will contribute at an architectural level when you have around 2-3 years of job experience. So, if you really want to climb the success ladder, you have to have a good grasp of such topics.

Here, I will cover very basics of scalability and reliability. For the sake of understanding, we will think of a fictional project and see how it scales depending on the user base.

Let's say you have created a social platform named Circle (we will just assume this name for future references), which is very similar to Facebook. This is a hobby project and you have given it to some of your friends to test it. So, your friends like Circle and they start using it from time to time.

Case 1: Very small user base

Let's say Circle is doing well among your friends and currently you have the following metrics:

Total Users	Concurrent Users	Requests/second	Latency
500	100	70-80	1-2 sec

For such a scenario, a 1GB single-core server worth 5$ and a managed database should be enough. How many loads your server can handle cannot be determined with any theories, because there aren't any exact theories or calculations. You have to implement monitoring and logging to know if you need to use more than one server.

But for the sake of this example, let's say your 5$ single server is handling the load pretty good with an acceptable amount of latency. Latency is the time required to complete a request. Whereas concurrent users are the number of users that are active in your website simultaneously at a time instant. You have the following simple architecture.

For future references, let's name this architecture A0.

Case 2: Small user base

More popularity, more users. With your current A0 architecture, you have the following metrics

Total Users	Concurrent Users	Requests/second	Latency
5000	2000	700-800	5-6 sec

You might have already understood that latency is the metric that we're concerned about because that gives a clear idea of how the website is performing. So, things are not going well. The latency is not acceptable. It's a slow website now. A slow website is as good as a dead website. Also, you're seeing that due to excess load your single server is crashing sometimes. You have to improve the latency. If you can reduce server load, latency will improve as a side effect (and vice-versa).

So, you add another server to handle more load. You add a load balancer at the front. This will distribute the load equally between the two servers using round-robin technique. The architecture is as follows.

Let's name this architecture A1. After implementing this architecture, you see the latency has been reduced to 2-3 seconds, it seemed acceptable. So, you're good for now.

Case 3: Moderate user base

Circle is not your hobby project anymore. It has gained huge popularity and there is a sudden spike in your network traffic. You have the following metrics with the A1 architecture

Concurrent Users	Requests/second	Latency (Actual)	Latency (Expected)
20K	12K	10sec	<= 2sec

Before jumping to improvements, please note that I have removed the total user count and will continue to do so from this point onwards. Total user count doesn't affect the scalability in any way. We have to look for the active users who are using the application concurrently because those users are actually putting the load in the servers, not the inactive users.

Let's get back to the metrics. Ten-second latency is beyond acceptable. You set your expectation that the latency should be less than or equal to 2 seconds at this point. What to do to achieve this? Just add more servers

After implementing this, you see the latency has improved up to 4 seconds. But this is still below your expectation. Maybe you can do something else rather than adding more servers? Yes, you can. After some observation, you see that your database is facing a high load all the time.

Previously, we were only concentrating on servers because the server performance was the bottleneck. We just added more servers to overcome the issue. Now, the database load has become a performance bottleneck. If we don't address this database load, adding more servers will be of no use. Now, you may suggest adding more databases, just like we did with servers. But that is not preferable at this point for the given reasons:

Databases are very expensive. A very simple managed database having only 30GB storage would cost you around 30-40$/month
Syncing across databases and keeping data consistent across them is a very challenging task from an engineering perspective

Besides, databases are designed to handle huge loads and we aren't even close to the saturation point for the given scenario.

So, you monitor your weblogs and see that your application is read-heavy. It's a social platform, so people will actually read much more frequently than posting something in their timeline. This is true in many cases, most of the general applications are read-heavy.

Fortunately, there is an efficient way to reduce database load on reading operations. You just have to introduce a caching layer.

Caching: What caching basically does is that it stores frequently requested data in the memory (RAM). We know that RAM access is much faster. So if the data is found in caching layer, then there is no need to hit the database. Caching is done in memory, so it's much faster than a traditional database. If the data is not found in the cache, only then the request will be directed to the database. This mechanism will drastically reduce database load and significantly decrease read operation latency. For our social platform, frequently accessed data might be the user feeds, the reactions and comments, friend lists etc which can be easily cached. We can use Redis as the caching database.

On another note, caching is a very difficult thing to implement correctly. If the data changes, the cache needs to be updated. Otherwise, users will be served with old data. Besides, it's difficult to determine which data properties should be cached to achieve better results.

Now with the caching layer, you see that the latency is around 1-2 seconds. The final architecture for this case is given below

Let's name this architecture A2.

Evaluating your service reliability

One day, your service goes down for hours, your customers are angry and your platform will start losing profit. Your website is not reliable because it is down for hours. In a real-world scenario, just imagine the impact of Google being down for hours or Amazon.

So, how this downtime might occur? Well, the software is unpredictable. There might be downtime due to server maintenance issues, a bug that caused the server to crash etc. In the final architecture A2, can you imagine any possible scenario when the clients might not be able to connect to your platform, at all?

Let's say one of the servers from the server pool goes down due to some crash, there are still many other servers to handle the clients. The load balancer is intelligent enough to re-route new requests to the working servers. In the meantime, you can fix the issue with the bad server and once is fixed, it will be back in the working pool again. So, the server downtime issue is already fixed. Can you imagine any other devastating scenario? Is there any fault with the current architecture?

Understanding "Single Point of Failure"

Previously, if one server went down, the load balancer would re-route the request to other working servers. The load balancer itself is a server. What happens if the load balancer goes down? If you look into the architecture carefully, load balancer was the single point where all your clients connected. The clients only know the load balancer IP, the server IPs are never exposed to the clients. Such points are called single point of failures, as if that point stops working, your whole system is down. In a system architecture, we often have to look out and eliminate such single failure points.

To solve this issue, just like the server, you have to add redundancy in load balancing too

There are multiple load balancers now, how the client would know which load balancer to connect?

Every service has a record in the DNS system. So, let's say if your DNS name is circle.com, then you can assign multiple IP addresses for the same DNS. As you have guessed, the IP addresses will be the load balancer IPs. So, the DNS will be responsible for distributing the requests. Now, you might ask, what would happen if the DNS is down? Practically, if DNS is down, the whole internet is down. DNS is a global thing and I don't know any incidences of the DNS system being down. The architecture is given below:

Here, we have placed M number of load balancers and N number of servers, where M << N. Let's name it A2.5. This architecture has no single point of failures.

What about database failures?

Currently, we have a traditional database and a caching database in the system. This might also go down. But, we assume that we are using managed databases. It means the providers of the database (like AWS, Google Cloud) will manage the database for us. Providers will constantly keep a backup and redundancies as required.

Case 4: Large user base

With the previous A2.5 architecture, you can serve a large number of users just by increasing values of M (load balancer count) and N (server count). But at one point, there is a very high chance that you will face a performance bottleneck in the database. It's simple, more users, more data. And even databases can get slow if you try to run queries in, let's say 500M data rows. It might seem like a lot of data, but this is common for very large applications.

Let's go through our current observations, where we assume that the values of M and N is high enough for a large system:

Concurrent Users	Requests/second	Latency (Actual)	Latency (Expected)
100K	80K	6s	<= 2sec

So, this is the time to introduce some complexity in the data layer by introducing a master-slave architecture.

In a real-life scenario, you will have a lot more options before making your data layer complex. For example, breaking down services and scaling up separate services as required, each having their own database systems. But for the sake of learning and simplicity, we are considering a mono repo (single-service) with a single database

Master-slave architecture for the data layer

In such an architecture, there are multiple nodes of the same type. There is only one master node who are tasked with an important responsibility, whereas the slaves are helping hands for the master.

If we apply this architecture to a database system, you will have multiple databases. Among them, one will be master and the others will be slaves. One common practice is given below:

Read operations will be handled by slaves
The master will handle ONLY the write operations
There will be a synchronization service to propagate the write changes from master to slaves

Let's see how our final architecture would look like with the master-slave database in place:

After implementing this, you see your latency is reduced to 2s. So you're satisfied for now. Please note, this architecture is not perfect for practical use cases. To keep things simple I have removed several redundancies and components. Let's name it A3 for future references.

Case 5: Huge user base

Your users have expanded through different geographical regions. Your new metrics are:

Concurrent Users	Requests/second	Latency (Actual)	Latency (Expected)
500K	300K	5s	<= 2sec

And you have the following observations:

Your servers are situated in Asia (assume). Your USA user base is having high latency because of the extra distance the network packets have to travel
The data storage is so so huge that it seems it is not possible to store all the data in the master database

Now, solving this is very complicated and obviously too advanced for a beginner. Besides, there is no "correct" system design. But you have come this far, so I will give you a very high-level idea of how such scaling can be achieved:

Each region (Asia/Europe) will have its own independent architecture
There will be a gateway service for each architecture. When different regions need to talk to each other, they will use the gateway. The gateway service will decide what to do.
There will be a huge sized main data centre that will collect all data from different regions and will serve as the single source of truth. It will have its own redundancies.

The simplest presentation of the architecture is given below:

This is as far as I can go in terms of complexity. This is more than enough for a new learner. I wish I could have said to you that at this point, you know many things about system architecture. But to be honest, I can't say that. System architecture is much more complicated than what is shown here. We just scratched the surface. For example, we considered only client, server and databases. But in a real-world large scale application, there might be hundreds of different components working together. Interesting, isn't it?

I hope you have learnt something valuable and enjoyed reading this article. Please leave your feedback and questions in the comments below. If you like my articles, don't forget to follow and subscribe. Thank you!

Multiprocessing, Multithreading and GIL: Essential concepts for every Python developer

Ahmed Sadman Muhib — Fri, 04 Jun 2021 13:45:45 GMT

Multithreading and Multiprocessing are ways to utilize 100% of the CPU and create performant applications. If you work with complex web applications, machine learning models or video/image editing tools then you have to tackle multithreading/multiprocessing sooner or later.

Let's say you create an application for editing photos. When the photo has a very high resolution, you will see a very significant drop in performance. Why? Because image editing is mathematically expensive. It puts pressure on the processor. To improve performance, you have to introduce multiprocessing/multithreading. In that case, if a user's CPU has 4 cores, your application should use all 4 cores if required.

Compared to other languages, multiprocessing and multithreading have some limitations in Python. Lacking the knowledge might result in creating slow and inefficient systems. The main purpose of this article is to understand the difference.

Let's get started by knowing a little bit more about Multithreading and Multiprocessing

Multithreading vs Multiprocessing

Let's see the basic differences first

Multithreading

A single process, having multiple code segments that can be run concurrently
Each code segment is called a thread. A process having multiple threads is called a multi-threaded process
The process memory is shared among the threads. So thread A can access the variables declared by thread B
Gives the impression of parallel execution, but it's actually concurrency which is not the same as parallelism. Although, threads can run in parallel in a multi-core environment (more on this later)
Threads are easier to create and easier to throw away

Multiprocessing

Multiple processes, working independently of each other. Each process might have one or more threads. But a single thread is the default.
Each process has its own memory space. So process A cannot access the memory of process B
Two different processes can run at two different cores in parallel independent of each other
There is a significant overhead of creating and throwing away processes

Now, the terms concurrency and parallelism are not the same. By concurrent execution it means a task can start, progress and complete in overlapping time. It doesn't necessarily mean that they are running at the same instant. On the other hand, parallel execution means two tasks are literally running at the same instant (in a multi-core environment).

Also, we have to understand the difference between I/O bound and CPU bound operations.

If an operation depends on I/O (input/output) devices to complete its work, then it's I/O bound operation. For example, network requests, reading from a database or hard disk, reading from memory, writing to database - all these are I/O bound.

If an operation depends on the processor to complete its work, then it's a CPU bound operation. For example, matrix multiplication, sorting arrays, editing images, video encoding/decoding, training ML models all are CPU bound operations. The common thing here is a mathematical operation. Every example stated here involves heavy mathematical calculation which can be done by the processor only.

In a single-core processor, thread execution might look like below

There are two processes, namely Process 1 and Process 2. While Process 1 is executing, Process 2 has to wait. In the case of threads, they can be executed concurrently (not in parallel). So for example, if Thread 1 issues a web request, it can take some time for the web request to complete. In that idle time, the CPU will be given to Thread 2 and it can do its operation (maybe do another web request). Please note, thread switching is efficient if it's an I/O bound operation. In the case of CPU bound operation, it doesn't yield better results from the performance perspective.

In a multi-core scenario, thread execution might look like this:

In the above figure, there are 2 processes each having 8 threads (16 threads in total). As you can see, threads for Process 1 have expanded to core 2. Threads 1-4 and threads 5-8 (blue boxes, top to bottom) are executed in parallel, because they are running in different cores. The same applies to Process 2. Concurrency among the threads in a single core is still preserved. Thus, the computation power is doubled for a process. This was possible when we used multithreading in a multi-core environment.

Now, when to use multithreading and when to use multiprocessing? It depends. Based on the characteristics described above, a programmer will go with either of them. If communication is important, then threads might be better because memory is shared. There are some more factors involved but to keep things simple, we will not dive into those. But in the case of performance, there is a very subtle difference. Creating more processes would be slower than creating threads because processes have an extra overhead and threads are more lightweight in nature.

In most cases, a C++ or Java programmer will go with multithreading unless there is an absolute need to go with multiprocessing (btw, don't mix up the term multiprocessing and multi-core). But, can we say the same thing for Python? Unfortunately, no. Python is different from C++ or Java.

What's so different in Python?

Previously, we saw that threads of the same process might expand to the second core or more if required. Unlike C++ or Java, Python's multithreading cannot expand to the second core by default no matter how many threads you create or how many cores the computer might have. All the threads will be run in a single core. Why? It's to make the program thread-safe.

We know that memory space is shared between threads. So let's say you have a variable named counter which has a value of 3, counter = 3. Now, if thread A is modifying the counter variable, thread B should wait for thread A to complete. If both of them tries to modify the variable at the same time, there will be a race condition and the final value will be inconsistent. So, there should be some locking mechanism which can prevent the race condition.

Java and C++ use some other kind of locking mechanism to prevent the race condition. Python uses GIL.

Introducing Global Interpreter Lock (GIL)

We know that Python is an interpreted language and thus it runs in an interpreter. GlL is a type of mutex lock that locks the interpreter itself.

Python uses reference counting for memory management. It means that objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released. Let's see an example to make it more clear

import sysa = 'Hello World'b = ac = asys.getrefcount(a) # outputs 4

For the above example, the variable a is referenced in 4 places. The variable was referenced during the statements a = 'Hello World', b = a, c = a and sys.getrefcount(a).

Now, this reference counting system needs protection from a race condition. Otherwise, threads may try to increase or decrease the reference values simultaneously. If this happens then it would cause memory leaks, or, incorrectly release memory when the object/variable is still in use. This is where GIL comes into play. It's a single lock on the interpreter itself. It enforces the rule that any Python bytecode must acquire the interpreter lock before it can be executed.

Alternative Python Interpreters: One thing to note, GIL is only used in the CPython implementation of the Python language. CPython is written in C. This is the official and most popular implementation that you download from the Python website. Just so you know, there are other interpreter implementations of Python, such as Jython (written in Java), IronPython (written in C#) and PyPy (written in Python). These implementations don't have GIL. Although, they are not popular and very few libraries support them. Also, in later portions, you will know although seemingly obstructive, why GIL was chosen as the best solution.

The impact of GIL in multithreaded Python program

As GIL locks the interpreter itself, parallel execution of the program is not possible for a single process. So even if you create one hundred threads, all of them will run in a single core because of the GIL. The following figure clarifies how it would look

For C++, a hundred threads have been distributed in four cores. For Python, all the one hundred threads are running under the same core because of GIL.

The Remedy

It is possible to run a Python program utilizing all the cores available to you. How? Using multiprocessing instead of multithreading. For the record, I will show how to easily create a thread pool and process pool in Python. Of course, there are other ways to create threads and processes. But the one I'm going to show is more than enough in most cases.

In Python, when to use multithreading and when to use multiprocessing?
If your program does I/O heavy tasks, go for multithreading. Doing multiprocessing here would be a bad idea because of extra overhead. On the other hand, if your program does mathematical calculations, go with multiprocessing. Using thread, in this case, might result in decreased performance. If your program does both I/O and CPU related tasks, then you have to use a hybrid of both.

Multithreading with `ThreadPoolExecutor`

Let's say you have a program that scrapes web pages. Web requests are I/O bound operation, so threads are perfect here. Look at the code below:

from concurrent.futures import ThreadPoolExecutor, waitdef scrape_page(url):    # ... scraping logic    # remove the following line when you have written the logic    raise NotImplementedErrordef batch_scrape(urls):    tasks = []    with ThreadPoolExecutor(max_workers=8) as executor:        for url in urls:            # for executor.submit, the first argument will be the name of the function to execute. All the argument after that will be passed as the executing function's argument            tasks.append(executor.submit(scrape_page, url))    wait(tasks)if __name__ == "__main__":    urls = ['https://google.com', 'htpps://facebook.com']    batch_scrape(urls)

In the given code example, 8 threads will be spawned. We mentioned it using the max_workers argument.

Multiprocessing with `ProcessPoolExecutor`

Fortunately, ThreadPoolExecutor and ProcessPoolExecutor uses the same interface. So, the code will be almost similar to the previous one. For this example, we will assume we're encoding video files, which is a CPU intensive task

from concurrent.futures import ProcessPoolExecutor, waitdef encode_video(file):    # ... encoding logic    # remove the following line when you have written the logic    raise NotImplementedErrordef batch_encode(files):    tasks = []    with ProcessPoolExecutor(max_workers=4) as executor:        for file in files:            tasks.append(executor.submit(encode_video, file))    wait(tasks)if __name__ == "__main__":    filePaths = ['file1.mp4', 'file2.mp4']    batch_encode(filePaths)

Here, 4 processes will be created. All four process can run in parallel because each process has its own interpreter. Thus, the GIL limitation doesn't matter in this case.

Why GIL?

Now that you know how to run your Python program utilizing multiple cores, you might wonder why use GIL instead of other solutions. Also, why not remove GIL?

Python is a very popular and widely used language. With many good sides, there are some bad sides like GIL (or, is it really a bad side? let's see). If it was not for GIL, Python would not be so popular nowadays. Let's see the reasons

Other languages like Java/C++ use different locking mechanisms but with the cost of decreased performance for single-threaded programs. To overcome single-threaded performance issue they use something like JIT compilers
If you try to add multiple locks, there might be a deadlock situation. Also constantly releasing and acquiring locks has performance bottlenecks. It's not very easy to overcome these things while keeping awesome language features. GIL is a single lock and simple to implement.
Python is popular and widely used because of its underlying support for C extension libraries. C libraries needed a thread-safe solution. GIL is a single lock on the interpreter, so there is no chance of deadlocks. Also, it's simpler to implement and maintain. So ultimately GIL was chosen to support all those C extensions
Developers and researchers tried to remove GIL in the past. But as a result, they saw a significant performance drop for single-threaded applications. You should note that most general applications are single-threaded. Also, the underlying C libraries on which Python heavily depends got completely broken. A major thing like GIL cannot be removed without causing backward compatibility issues or slowing down performance. But still, researchers are trying to get rid of GIL and it's a topic of interest for many.
Ultimately, it seemed that the GIL limitations are not causing any impact when it comes to writing large and complex applications. After all, multiprocessing is still there to solve such problems. Today's modern computers have enough resource and memory to tackle multiprocessing related overheads.

That's it for today. I hope you learnt something new and interesting. If you like my article, please follow and subscribe to my blog . Also, if you have any questions or feedback, drop a comment below. Thank you.

Starting Web Development in 2022: The Complete Guide (with Resources)

Ahmed Sadman Muhib — Mon, 31 May 2021 15:28:52 GMT

If you're a beginner and want to have a tech career, you might be confused about all the career specialization choices. There are things like Machine Learning/AI, Data Science, Game Development, Quality Assurance, Cyber Security and of course, Web Development.

If you have already made up your mind and you have your reasons to choose a certain career path (excluding Web Dev), then, this post is not for you. Otherwise, I will try to convince you why you should start with Web Development as the first step:

It has a high demand in the job market
Web is the most accessible platform, anyone from anywhere can access it. All they need is an internet connection and a device capable of handling a web browser. If you make an Android app, it will be only runnable on Android devices. But if you make a Web app, everyone can use it
No matter what you choose as a tech career, you will be directly or indirectly involved with web development. Even if you're an AI Engineer/Machine Learning Engineer, still you would need basic knowledge of Web Development. So, learning Web Dev is not a bad investment in any scenario, even if you switch to Data Science or AI.
Ever used Uber? Of course you did (or at least some similar service). What does it do? It gives a ride-sharing platform and nothing more. Although the app seems very nice, it's actually the backend code that's doing all the magic, which also falls under the web category. They even have a dedicated Research and Development team called Uber Engineering where they get millions of investments and they're constantly bringing revolutionary changes in Software Engineering. Most of the work they do there is related to making the 'ride sharing' platform better, which might have seemed 'simple' to you a few minutes earlier. Interesting, right? You might store some data in a database, do some CRUD (No worries if you don't know the term) operations and make a nice looking web page. But you will be stuck when you're given a complicated system like Uber, which is running at a large scale and millions of users are using it at once.

The gist is, web development can be very challenging and interesting work. But to face those challenges, you need to get a job as an engineer. So, let's guide you on that path. I am a Software Engineer working with web-based SaaS product, so I think my knowledge is valid enough for you.

Originally inspired by the well known Developer Roadmaps, I shall share a mini roadmap with narrowed down choices (so that it's much easier for you to decide), my own experience and my learning resources so that you don't have to waste time to compile these things and look into the vastness of resources with great confusion.

Why follow my choices? Well, the technologies I listed here are widely popular and has a high demand. If you can grasp the whole roadmap, you would have the capability to tackle any other technology in future if required.

But before doing that, let's get familiar with some job titles that relate to web development:

Frontend Engineer (Client-side)

Turn web design (provided by UI designer) into code.
Implement user interactions like button events, visual feedbacks etc.
Implement data model and data state. Decide how data is going to be stored and manipulated throughout the application

Backend Engineer (Server-side)

Design and implement database schema
Take system architectural decisions. Decide how to scale up your application to thousands or millions of users
Implement business logic for your unique product (like what happens on the server-side when a customer requests a ride in Uber)
Use tools and technologies to provide data insights

DevOps Engineer

Bridges the gap between Developers (who write code) and the Operations (who run the code) team. In simple terms, a backend/frontend engineer writes the code, while the DevOps engineers main responsibility is to serve the written code to customers
Automates the deployment pipeline using CI/CD (Continuous Integration and Deployment). By deployment, I mean from writing the code to serving it to end customers (SDLC).
Manage the servers and network. Ensure systems are running properly without downtime
Not directly involved with the implementation of the product

Full-Stack Software Engineer

Works in both Frontend and Backend
In most companies, the title is generally 'Software Engineer'
Small to moderate companies offer full-stack roles. For bigger companies, the product is so complex that a full-stack role is not possible. In those cases, they offer specific Backend or Frontend positions

Site Reliability Engineer (SRE)

Engineers who solve operational/scale/reliability problems and maintain infrastructure
It's an overlap of DevOps and Backend Engineering, or we can say a glorified form of DevOps Engineering
Has somewhat involvement with the software product by providing reliability consultations (like what DB to use or not to use, how to scale properly etc)
Not really involved in writing code for the actual software, rather routine maintenance work

Please note, based on your company, you might be simply called a Software Engineer regardless of which above category you fall into.

In this article, I will guide you to the barebones, which is Frontend and Backend. These guidelines will be a starting point. I will also link a post of mine where you can get an idea of DevOps. There is no different career path for SRE. Just learn Backend Engineering and DevOps properly, that's it.

The Roadmap

If the rendering of the image is unclear, please use this direct link of the Roadmap. Please go through the roadmap carefully as we will be referencing this for the rest of the article. Estimated time duration to complete the roadmap following only one Backend path would be around 6-8 months. Also, this is the barebones of web development and there are a lot of things that you will have to learn later. But for now, this roadmap is perfect for beginners.

Pick a Programming Language

This is the first step. As I have listed, my preferred choices are Python and Javascript. These two are very promising languages and they are never going to go away. Let me share the benefits of picking either of these languages

Python

It's easier to learn, difficult to master. It looks like almost natural language
Has powerful and battle-tested (read 'mature') web tools and frameworks
Great community
If you're thinking of switching to Machine Learning, AI or Data Science later, go blindly with Python. Because literally speaking, Python is the 'heart' of those sectors

Javascript

The most popular and loved language
Great community
This is the language with diverse possibilities. If you learn this language, you can do Web Frontend, Server programming (backend), cross-platform mobile app development, desktop app development, game development and even for Machine Learning (experimental) and Embedded Systems programming. Although, I would not recommend this language for the last two. But you can see the picture, this language has the most diverse amount of use cases.
Jeff Atwood, one of the founders of the life-saving platform (for devs) Stack Overflow, said this back in 2007

Any application that can be written in JavaScript, will eventually be written in JavaScript. - Jeff Atwood

Popularly known as Atwood's Law, this is totally true nowadays. Recently, Google released zx, which is a JS scripting library that might have the capability of replacing Bash/Shell scripting in future (just my personal opinion, but it's very promising though).

So make a decision and pick one language. Both of them is fine. Remember, which language you pick will affect your choice for the Backend section. Eventually, you should target to cover the whole roadmap, not just a specific path. So, look into the benefits and decide which language suits you better and start learning.

Okay, if you still can't decide, go with JavaScript.

You might have heard of Node.js, it's a programming language. It's identical to JavaScript, with absolutely zero difference. It is given a different name because it can run outside a web browser environment, unlike JavaScript.

Learning resources are provided at the end of the article

Design Patterns

Design patterns are typical solutions to common problems in software design. Each pattern is like a blueprint that you can customize to solve a particular design problem in your code

Design Patterns are the most valuable but most underestimated skill. It relates to how you should structure your code in an Object-Oriented Design environment. In some cases, you will not be using these pre-defined patterns directly. But these are required so that you understand common OOP principles and even use customized versions of these patterns when you're working with large codebases.

Unless you have a thirst for learning, you don't need to go through all the patterns. Just focus on those listed below:

Creational Patterns: Factory, Abstract Factory, Singleton, Builder

Structural Patterns: Adapter, Decorator, Facade

Behavioral Pattern: Command, Observer, Visitor

Don't know what Object Oriented Programming is? no worries, follow the roadmap and it will come on its own accord

Database

Databases are used to store information. There are many diverse database choices. Mainly divided into SQL and No-SQL categories. I have made the choice easier for you. You're a new learner. So, it doesn't matter which database you choose to learn first. In an ideal scenario, you should learn both MongoDB and Postgres (at least at some point in future).

You need to be fluent with at least one database technology before jumping into web dev. If you previously chose the Python language, take Postgres (SQL). Otherwise, MongoDB (No-SQL database). Just so you know, you can use SQL databases with JavaScript too. I just recommend MongoDB because its syntax and usage matches JavaScript. Also, most JS-based web dev tutorials cover MongoDB by default. Although relatively new, MongoDB is being widely adopted by many companies. It can overcome many design limitations of a traditional SQL database. But don't think one is superior to another. Both have their own advantages and disadvantages.

Version Control with Git and GitHub

Any software is subject to change, it can change daily, or even hourly. This is why Software Engineering exists in the first place. Version control systems like Git and GitHub are tools that allow you to record changes in your codebase and revert to a previous point if something fails. Also, it gives you the necessary tools to collaborate with other developers. It's a must-have knowledge no matter which tech sector you work in.

Frontend

You have to start with Frontend first. You have to learn HTML, CSS, SCSS (CSS on steroids) and React. React is a front-end framework that makes front-end development much easier. There are also other frontend frameworks, but learning React is the best choice as it has the best features of the bunch and also high demand in the job market.

Backend

Now you have to choose either of the paths. The choice should depend on what you picked in the Language section. We will be focusing on how to build REST APIs because this convention is widely used in every type of mobile or web applications you see around you. All the provided tutorials for the backend has REST APIs covered.

Python Path

Flask is a Python-based framework to write web applications using the Python language. It's easy to learn and very extendable. There's another framework named Django. But Django automates a lot of stuff. Personally, I don't prefer Django. On the other hand, Flask gives you a lot of control. Starting a project with Flask is much easier compared to Django. As Flask gives you much control, you will be able to learn a lot of things because those will be implemented by your own hand

As for database choice, there is only Postgres. Please note, you can use any database with Python. Not just Python, you can use ANY DATABASE with ANY LANGUAGE. But in this path, I have selected Postgres. Later when you master the whole roadmap, you can also use MongoDB with Flask if you want

ORM (Object Relational Mapping): ORM is an abstraction layer that sits on top of the database. Using ORM, you can use the programming language's own feature to manipulate and access data, instead of using a query language like SQL.

Let's say you want to select all students from the student table whose age is greater than 18. Also, assume you're using MySQL database. But before fetching the data, let's create a table where the data will be stored. We have to use SQL to issue the commands

CREATE TABLE students (  id INT PRIMARY KEY,  name VARCHAR(50),  age INT);

Running this SQL will create a table. Now to store some data:

INSERT INTO students VALUES (1, 'John', 17);INSERT INTO students VALUES (1, 'Doe', 19);

Now, to fetch the students whose age is greater than 18

SELECT * from students WHERE age > 18;

The above is a MySQL specific database query. SQL is quite similar to natural language. The given query will return all students having age greater than 18.

Now, if you use ORM with your Python application, you don't need to know the SQL at all. Instead, you will do something like this to create a table (Python example):

class Student:  id = db.Column(...)  name = db.Column(...)  age = db.Column(...)

The above is a class, which you will learn during your OOP portion of your selected language.

Inserting records

student = Student('John', 18); # id will be calculated automaticallystudent.save()

To get all students whose age is greater than 18

Student.query.filter_by(age > 18).all()

As you can see, ORM allows you to use Python syntax to query and manipulate the database. You don't even need to know SQL (although you have to because otherwise, you won't know how to use ORM efficiently).

More importantly, using ORM saves you from a lot of troubles, for example, it prevents SQL injection which is a hacking method. Also, you can switch to any SQL database at a later point. For example, let's say we're using MySQL. Later if we want to switch to Postgres, we don't have to change the above code at all. How does ORM do that? Well, it's just an abstraction. Based on which database engine you're using, it can generate the equivalent dialect for you. So, the ORM will convert the Python statement Student.query.filter_by(age > 18).all() into

SELECT * FROM students WHERE age > 18;

Not only Python, but every language also offers some kind of ORM to work with. For Python, the most popular one is SQLAlchemy. You will have to learn that.

Node.js Path

Don't worry about the title, JavaScript and Node.js both are the same as I stated earlier. For server programming, you will have to get out of the browser environment thus calling it Node.js seems appropriate here. In this path, the web framework is Express, the database is MongoDB. MongoDB is a No-SQL database that is quite different from a SQL database. At some point in future, you should know both MongoDB and Postgres.

The ODM (Object Document Mapping) is Mongoose. ODM and ORM are the same things. But as this is a No-SQL database, there are some changes in the terms and definitions

Clean code and refactoring (Optional)

For the curious-minded people out there. This will definitely help when you start your first job. You can go through this much later if you want. Clean code and refactoring is a skill that comes through experience. But still, you have to know some concepts so that you get directed to the right path.

Resources

The section you've been waiting for! Here, most of the resources I share were used for my own learning. For some parts, you have to follow multiple tutorials. Please note, order matters in those cases.

Many Udemy resources are paid. Although pirated contents are available, you are encouraged to spend on the courses because it's worth investing in. So, here we go:

Language

Python

The Python Mega Course

JavaScript

The Complete Javascript course by Jonas

Please note, this will also include some basic HTML, CSS. To learn JavaScript, there is no way to avoid HTML, CSS. But no worries, the instructor will guide you through smoothly.

Clean Code and Refactoring

Refactoring.guru

Design Patterns

Refactoring.guru: Design Patterns

Database

Postgres

MySQL Database Bootcamp

MySQL and Postgres are quite the same in syntax, only the engine is different. There's no good tutorial for Postgres in my knowledge so follow the MySQL one.

MongoDB

MongoDB - The Complete Guide by Academind

Git and GitHub (Version Control)

As you can see, there are multiple resources. You have to go through all of them, in order. Don't worry, these courses are very small. It would take you at most 1 week to master the basics.

Frontend

HTML, CSS, SCSS

React

React 16: The Complete Guide by Academind

Things like Redux, Redux Thunk are obsolete nowadays. It's also difficult to understand. React has introduced 'Context API' which works as a replacement for Redux. So, you can safely skip or skim through the Redux related sections.

Backend

Python Path

Build REST APIs with Flask (Very average tutorial, misses a lot of things. But still a bit better than others. You have to follow it because it shows some modern techniques)
Flask Mega tutorial by Miguel Grinberg (Read the whole series. This will be the series that you will take as reference point while writing Flask code. Forget the bad practices introduced to you in the previous tutorial)

Just a reminder, where you see these numbered tutorials, you have to follow all of them, in order. These are not options, but mandatory.

Node.js Path

The given tutorials cover everything from the roadmap.

Where to go from here

Now you have a solid basic understanding to create robust Web solutions. Soon, I will release another article with a roadmap of advanced topics. Stay tuned! For now, you can look into the Cloud section. I have a great starting point I've provided in my blog titled Cloud Deployment for absolute beginners.

Some tips for new learners

Don't rush. Take your time to learn these things. Learn everything thoroughly
Do Projects. This is extremely important. Don't fall into the learning loop. Be ready to get your hands dirty as soon as possible
Every project you do, keep it in GitHub. This is extremely important
Try to get involved in open-source contributions . It will drastically boost your profile
Don't forget that Data Structures and Algorithms are important and might be asked in company interviews. This is not a Computer Science roadmap, so discussing algorithms is out of scope. But you should have a basic understanding. You can use Leetcode (Easy + Medium levels) to practice interview questions.
Don't waste time on Coursera certificates. Instead, invest time in doing projects. By far, I haven't seen any company that judges a candidate by their Coursera certificates. I'm discouraging this because most people try to gather as much certificate as possible, without learning anything valuable. Even for a beginner, a company will look into how much real experience you have building things and solving problems, not those fancy certificates you showcase on Linkedin.
It's much better to pick few things and master them instead of being a jack of all trades
There is no universal tech stack that can get you a job in any company. When hiring beginners, most good companies are tech agnostic
Even if you don't know a tech stack, you can learn that easily because now you have a strong fundament

Thank you for reading this far. If you have any questions or feedback, please don't hesitate to drop a comment. This will encourage me a lot.

If you liked this article, please share it with your circle. Also, don't forget to follow my blog. Have a nice day!

Cloud deployment for absolute beginners

Ahmed Sadman Muhib — Sat, 22 May 2021 13:35:54 GMT

Let's say you have created an awesome web application that uses some Machine Learning to suggest movies you might like. Now what? You have to share it with the world so that people can use it, right? This is where deployment comes into play.

In this post, I will give a high-level idea of cloud deployment. You will be introduced to different terminologies and use cases. I will refrain from going into low-level technical details of deployment (like, how to host a Node.js web app). There are already a lot of articles available for that and I have linked them in this post.

Besides, there are a lot of posts explaining the deployment process. But rarely those articles give the whole idea of cloud deployment in a beginner-friendly way, thus this post. If you're new to cloud computing, you will find it worthy. After reading this, I hope many of your confusions will be cleared.

The basics you need to know:

I assume you already know basic web development and you want to deploy your application
Good command over Linux

Things that are discussed in this post:

What is cloud deployment and how does it work?
As a beginner, where should I start learning?
Which cloud provider to use?
What is the difference between Heroku and AWS
What is localhost and private networks?
NGINX/Apache? What are these?
Networking essentials

...and many more

What do you mean by Cloud Deployment?

Well, to deploy your application you can do the following:

Buy a server computer
Get a public IP to expose the server on the internet
Configure the OS to run your stack. For example, if your code is written in Python, install necessary packages (like Flask, Django etc) and configure them properly
Configure a web server to host your application to the internet
Keep your server running 24/7 at any cost. Because if your server is down, your application will be unreachable

Seems scary, right?

To add on top of that, please note that as your application grows, you will need more and more servers to handle the load. So, now you have to maintain multiple servers.

The above scenario is almost obsolete nowadays. Unless you're handling very sensitive data (like state secrets), you will be using a cloud provider to host your application on the internet.

So, basically, what a cloud provider does, solves points 1, 2, 3 (partially) and 5 for you. By just using some Web UI, you can spin up new computers for you in less than a few minutes. It will be automatically configured with an OS, will have a public IP and will never be down (well, almost never). This computer is residing in a remote place in a large data centre among many computers (like cloud particles in the sky), we can see it (read 'access it') but don't know exactly where it is located, just like the 'Cloud', thus the term 'Cloud Computing'.

How do they do it? They have enough money and manpower to handle millions of computer and resources at once. They serve you this service through some sophisticated mechanism.

There are currently many cloud providers. The leading ones are Amazon Web Services (AWS) , Google Cloud Platform , Microsoft Azure , DigitalOcean etc. If you're already tinkering with deployments, you might have also heard of Heroku, Netlify and Vercel. There are very clear differences between these two type of providers.

Heroku vs AWS

You can host your web application using both of these providers. But in Heroku, automation is the key. Heroku will automatically configure the OS and the tech stack for you. You just have to tell Heroku that you want to deploy a Node.js application and that's it! Thus, it will solve point 3 from above along with others. Also, it will automate the deployment process using GitHub, so that each time you push some code, your application will be automatically deployed.

But, this is not what cloud deployment actually is. First of all, you will have little to no control over how your application is deployed and maintained.

Secondly, these providers don't actually give you a computer/server, so you cannot remotely access those computers and do custom things on your own. They just provision you with some resource to run your web application and abstracts away from all the other things.

Thirdly, such providers are much more costly than AWS or Google Cloud platform because of the extra automation. For example, if you want to run a production application in Heroku, you will have to pay a minimum of 5$ for each application. On the other hand, DigitalOcean will give you a server for 5$ where you can run 2-3 applications at once.

If you just want to host a hobby project, Heroku/Netlify is the best option. Because the basic usage tier for Heroku/Netlify is FREE. But if you really want to get into the world of deployments (DevOps), or even kickstart your Software Engineering career, you should get your hands dirty by using AWS, DigitalOcean or similar providers.

Remember, using cloud providers requires that you have some idea of basic Linux commands. No, they won't give you nice little graphical interfaces to work with. The servers are mostly Linux based. You will have to run all your operations through the command line just like the good old days.

Now the question, where should I start?

Well, pick a cloud provider, spin up a server and start tinkering with the command line. See how things work. Maybe follow a tutorial on how to configure your freshly picked server. But before going to that, let's decide on which cloud provider to use.

Why DigitalOcean?

Contrary to popular opinions, I would advise starting with DigitalOcean.

Pros:

It's much cheaper compared to other providers
Has a clean and easy-to-use UI
Has a lot of documentation written in a beginner-friendly way (most important point)

Cons (don't worry about these now):

Has few services (for example, no support for automatic load balancing)
Has small tier of computing resources, there's not a lot of options compared to other providers
Some advanced networking and access management might not be possible or has to be done manually
Availability is not as reliable as GCP or AWS.

If you're a beginner, there's a chance that the cons went over your head. At an entry-stage, those cons can be ignored.

All cloud providers will provide you with one thing, a computer system to work with. So, there's no learning difference at all. If you can work with DigitalOcean, you can work with AWS, Google Cloud Platform (GCP) or any other. In terms of learning, the only difference is their UI, access management and extra services they provide.

DigitalOcean pricing comparison for a 1Core - 2GB memory instance

How to use DigitalOcean?

I'm not going to give you a step by step guide. If you're reading this, you should already know how to register for a service and add a credit card for payment.

You will be billed on an hourly basis. So if a specific computer costs 0.015$/hour and you keep that computer running for whole one month (24/7), it would cost you roughly 10$ at the end of the month.

In the DigitalOcean world, a single computing instance is called a Droplet. When you create a Droplet (a computer, or a computing instance), you can access that droplet and do whatever you want with it (like hosting web apps, playing with the terminal or even break the whole system apart 😉 )

Now how to create the droplet? Well, if there is an already well-written guide on the internet, there's no point of re-writing it, so follow this guide on how to create your first droplet. Awesome documentation, right?

What distribution/image to choose?

We love Linux for servers. So I would recommend, from the Distributions tab, creating a plain Ubuntu machine with the cheapest plan (5$) for a start. This will give you a vanilla server to start with.

You could've also chosen some pre-built images from the Marketplace, then you wouldn't need to manually configure your server (discussed below). So for example, if you're hosting a WordPress website, you could have chosen that and the newly created server would come with Wordpress installed with all the necessary things configured.

But, we are learners, we don't want to automate things for us, because there is no learning in that case. So, don't use Marketplace images for now.

What is SSH?

Now that you have created your first remote computer, you have to access it using SSH (Secure Shell Protocol). In simple terms, SSH uses a cryptographic network protocol to securely connect you to a remote computer, from where you can access the computer through a terminal.

Let's connect

Now, let us see how to connect to your first droplet using SSH..

Well done! You're connected to your remote computer/server. Now it's time to configure your server. Yes, a server needs some basic configuration to start with.

Okay, you have a server set up properly. Now, you have to install the necessary software to run your web application. For example, if you want to run a Python application, you have to install Python and the necessary packages.

But till now, you have hosted your application in localhost using a development server. Now, you're on an actual remote server and want the whole world to see your app (or maybe not the world, just you, so that you can test), so there are some changes on how to deploy your application.

I will provide you with some documentation of how to deploy your applications in DigitalOcean. I suggest digging into these articles only after completing this article of mine. Although these are written by DigitalOcean people, you can use these docs to deploy your apps to any server having any providers:

Don't forget to praise DigitalOcean's nicely explained documentations.

Server networking 101

The above documentations seems nice, but as a learner, you have to be familiar with some basic server networking terminologies.

localhost

Localhost, simply accessed by the URL localhost or 127.0.0.1, is a hostname that refers to 'this' computer's network. By this, I mean the current computer I'm working with. It's useful to test different applications because the data is not sent over the internet, instead, all the data stays on the computer. In this case, your computer gets a simulated web server where you can load the necessary files of a program into the web servers and check its functionality.

Please note, the idea of referring to your own computer's virtual server while accessing localhost, is called the loopback mechanism.

But, did you know, if you host an application in your localhost, people connected to the same network can easily access your application? How?

In a private network (like connecting to a Wifi Router), your computer will have its own private IP address. As you're already on a network, other users should be able to access your computer if they know the IP address. By access, I mean several things. For example, users on the same network can ssh into your computer. Also, they can access hosted applications (like your web application).

What localhost or 127.0.0.1 basically does, it points to your own computer. For all OS, localhost is a standard to create a loopback address.

Let's say, your private IP address is 192.168.30.5. You're serving Node.js application on port 5000, so in your own computer, you would access it as localhost:5000 or 127.0.0.1:5000. Now, if another user on the same network wants to access your application, he has to visit 192.168.30.5:5000. That's it. Beware, Your application is not available to the public internet, after all, you're in a private network.

To find your IP address, in Windows, type ipconfig in the command prompt. For Linux, generally, it is ip addr show. From the list, you have to find your Wifi Adapter and get the IPv4 address that starts with 192.168.x.x.

ports

At the software level, a port is a logical construct that refers to a specific process running on the server. The ports starting from 0 to 1023 is reserved by the operating system. For own use cases, the common port range is 3000 and upwards. For a specific web application, the port is specified at the end of the IP address, using a colon:

127.0.0.1:5000

In the above example, the port number is 5000. The server which is listening to 5000 will get the request. When the server is active (listening), port 5000 will be blocked and no other process can use it.

You can run multiple web applications in one machine by using different port address. Let's say you have a Node.js API running on PORT 5000, Frontend running on PORT 5001 and database service is running on PORT 27017. Assuming your droplet's IP address is 10.220.20.23, you can access those as:

10.220.20.23:5000 // API10.220.20.23:5001 // Frontend10.220.20.23:27017 // database

127.0.0.1 vs 0.0.0.0

Based on previous discussions, you already know what localhost or 127.0.0.1 is. When deploying your application, you might also need to use the address 0.0.0.0 in your server config. In the context of servers, 0.0.0.0 means all IPv4 addresses on the local machine. If a host has two IP addresses, 192.168.1.1 and 10.1.2.1, and a server running on the host listens on 0.0.0.0, it will be reachable at both of those IPs.

What are Nginx and Apache?

To process incoming connections from the outside world (the public internet), you need a web server.

A web server knows how to talk to the internet using HTTP and HTTPS network protocols
It can route requests to different services based on the request type (fancy name is Reverse Proxy). You might have the backend API and frontend hosted on the same machine. Nginx/Apache can redirect API/backend requests to API service and frontend requests to frontend service
Helps you set up domain names
Creating different rules for different URLs. For example, you might want to redirect users from 'oldsite.com' to 'newsite.com'.
Caching
SSL

...and a lot of things.

Things that a web server like Nginx can do is outside the scope of this article. The above listed are the very basic things.

A web server is essential to run your web application. In a regular setup, you will always use nginx/apache to serve your application. Even if in some cases you're not using them directly, do know that it is being used in the top levels somewhere.

Then what are Gunicorn and uWSGI in the Python world?

Gunicorn or uWSGI are application servers . To talk to Python application we need WSGI. Gunicorn and Nginx have WSGI implemented, where Nginx/Apache doesn't. Nginx/Apache is used for more general-purpose server works. Gunicorn or uWSGI works as a middleman between Nginx and the Python web application.

So, in a Python set-up, Nginx will be responsible for reverse proxying, caching, SSL, serving static files etc. While dynamic requests that should be handled by the Flask/Django application will be passed to Gunicorn/uWSGI. The interaction between these two are as follows:

nginx receives the incoming request from the internet
Based on the Nginx config it can decide if this request needs to be passed to Gunicorn/uWSGI
Gunicorn passes the request to the Python application
The application sends back the result to Gunicorn after processing
Gunicorn sends back the result to Nginx
nginx replies to the user with the response

What if my user increases and I have to handle 1000 req/sec? The current deployment process is too lengthy and counter-productive, how do I automate it...?

Woah! Slow down. We just scratched the surface here. Cloud deployment is a whole different topic on which you can do full-year courses. Nowadays, the sector is called DevOps and people working on DevOps are called DevOps Engineers. Also, Site Reliability Engineers (similar to DevOps but not identical). Once you're comfortable with the basics, you can do courses on DevOps and learn more. An interesting world awaits you.

Where to go from here?

Once you are confident that you understand the basics of server deployment, try these:

Learn shell scripting. It's the most underrated tool for an Engineer working in the technical or IT sector. Whenever you want to automate something or add custom behavior you have to do shell scripting in Linux. Start learning from a crash course here.
Learn Docker. It's the most used deployment tool no matter where you go. For this, you can follow Docker Mastery course in Udemy.
Learn Jenkins. You can use CI/CD Jenkins course from Udemy.
If you're really serious about DevOps and want to pursue your career in this role, you should also learn Kubernetes

That's it for today, folks. I will try to write more awesome articles for both beginners and advanced level users in the future. I will also share my technical experiences here. Stay tuned. If you like this article, please share it with your network. Thank you for reading this to the end.

Muhib's Blog

The essence of writing clean code: Part I

Principles of writing clean code

1. Naming 🪧

Variables and Properties

Functions and Methods

Classes

Casing

2. Code Formatting 💻

3. Writing better functions 🔨

Keep it short

Make it read like an instruction manual

Why you should not use JWT for authentication

What is JWT?

Advantages of JWT

Disadvantages of JWT

The solution: Good old session cookies

Advantages of session cookies

Disadvantages of session cookies

Implementing a REST-like API with Flask and session cookies

Where to use JWT?

Docker: A conceptual overview

What is an Image?

What is a Container?

How Docker work behind the scenes?

Namespace and Control Groups

From images to containers

Hypervisor vs Docker

Conclusion

Introduction to Microservices architecture with Tinder

Big Old Monoliths

Pros

Cons

Let's split them up, onto Microservices

Pros

Cons

Designing Tinder using Microservice architecture

Handling the communication chaos

Technologies used for microservices

Conclusion

Introducing ScreenView: The missing social platform for movie/tv show lovers

What is Auth0?

What is ScreenView?

The problem

The solution

The building blocks of ScreenView

Challenges we faced

Quick sneak-peak

Try it out!

How to build a strong profile for the tech job (engineering) market

Building your brand

A strong GitHub profile will speak for itself

Do projects

Try to avoid frontend-only projects

Preach your projects

Get involved in open-source contributions (optional)

Build the homepage

Share your knowledge with the world

Internships and part-time jobs

A simple and minimalistic resume

Grow your network

Participate in hackathons

LinkedIn says it all

Build valuable connections

Don't hesitate to reach out to strangers

Ask for recommendations

Coursera Certifications

Preparing for interviews

Understanding password hashing and salting for enhanced security

What is password hashing?

Security considerations for password hashing

Salting your hashes

Handling data breach

Protecting yourself from a data breach

Synchronous vs asynchronous programming and their use cases

Concurrent and parallel restaurant orders

Concurrent/Asynchronous restaurants

Parallel/Synchronous restaurants

Is asynchronous programming better than synchronous programming?

Synchronous vs Asynchronous: Technical differences

Multithreading with `ThreadPoolExecutor`

Multiprocessing with `ProcessPoolExecutor`