logo资料库

Grokking the System Design Interview by Design Gurus (z-lib.org)....pdf

第1页 / 共163页
第2页 / 共163页
第3页 / 共163页
第4页 / 共163页
第5页 / 共163页
第6页 / 共163页
第7页 / 共163页
第8页 / 共163页
资料共163页,剩余部分请下载后查看
System Design Interviews: A step by step guide
Step 1: Requirements clarifications
Step 2: System interface definition
Step 3: Back-of-the-envelope estimation
Step 4: Defining data model
Step 5: High-level design
Step 6: Detailed design
Step 7: Identifying and resolving bottlenecks
Summary
Designing a URL Shortening service like TinyURL
1. Why do we need URL shortening?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. System APIs
5. Database Design
Database Schema:
6. Basic System Design and Algorithm
a. Encoding actual URL
b. Generating keys offline
7. Data Partitioning and Replication
8. Cache
9. Load Balancer (LB)
10. Purging or DB cleanup
11. Telemetry
12. Security and Permissions
Designing Pastebin
1. What is Pastebin?
2. Requirements and Goals of the System
3. Some Design Considerations
4. Capacity Estimation and Constraints
5. System APIs
6. Database Design
Database Schema:
7. High Level Design
8. Component Design
a. Application layer
b. Datastore layer
9. Purging or DB Cleanup
10. Data Partitioning and Replication
11. Cache and Load Balancer
12. Security and Permissions
Designing Instagram
1. What is Instagram?
2. Requirements and Goals of the System
3. Some Design Considerations
4. Capacity Estimation and Constraints
5. High Level System Design
6. Database Schema
7. Data Size Estimation
8. Component Design
9. Reliability and Redundancy
10. Data Sharding
11. Ranking and News Feed Generation
12. News Feed Creation with Sharded Data
13. Cache and Load balancing
Designing Dropbox
1. Why Cloud Storage?
2. Requirements and Goals of the System
3. Some Design Considerations
4. Capacity Estimation and Constraints
5. High Level Design
6. Component Design
a. Client
b. Metadata Database
c. Synchronization Service
d. Message Queuing Service
e. Cloud/Block Storage
7. File Processing Workflow
8. Data Deduplication
9. Metadata Partitioning
10. Caching
11. Load Balancer (LB)
12. Security, Permissions and File Sharing
Designing Facebook Messenger
1. What is Facebook Messenger?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. High Level Design
5. Detailed Component Design
a. Messages Handling
b. Storing and retrieving the messages from the database
c. Managing user’s status
6. Data partitioning
7. Cache
8. Load balancing
9. Fault tolerance and Replication
10. Extended Requirements
a. Group chat
b. Push notifications
Designing Twitter
1. What is Twitter?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. System APIs
5. High Level System Design
6. Database Schema
7. Data Sharding
8. Cache
9. Timeline Generation
10. Replication and Fault Tolerance
11. Load Balancing
12. Monitoring
13. Extended Requirements
Designing Youtube or Netflix
1. Why Youtube?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. System APIs
5. High Level Design
6. Database Schema
7. Detailed Component Design
8. Metadata Sharding
9. Video Deduplication
10. Load Balancing
11. Cache
12. Content Delivery Network (CDN)
13. Fault Tolerance
Designing Typeahead Suggestion
1. What is Typeahead Suggestion?
2. Requirements and Goals of the System
3. Basic System Design and Algorithm
4. Permanent Storage of the Trie
5. Scale Estimation
6. Data Partition
7. Cache
8. Replication and Load Balancer
9. Fault Tolerance
10. Typeahead Client
11. Personalization
Designing an API Rate Limiter
1. What is a Rate Limiter?
2. Why do we need API rate limiting?
3. Requirements and Goals of the System
4. How to do Rate Limiting?
5. What are different types of throttling?
6. What are different types of algorithms used for Rate Limiting?
7. High level design for Rate Limiter
8. Basic System Design and Algorithm
9. Sliding Window algorithm
10. Sliding Window with Counters
11. Data Sharding and Caching
12. Should we rate limit by IP or by user?
Designing Twitter Search
1. What is Twitter Search?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. System APIs
5. High Level Design
6. Detailed Component Design
7. Fault Tolerance
8. Cache
9. Load Balancing
10. Ranking
Designing a Web Crawler
1. What is a Web Crawler?
2. Requirements and Goals of the System
3. Some Design Considerations
4. Capacity Estimation and Constraints
5. High Level design
How to crawl?
Difficulties in implementing efficient web crawler
6. Detailed Component Design
7. Fault tolerance
8. Data Partitioning
9. Crawler Traps
Designing Facebook’s Newsfeed
1. What is Facebook’s newsfeed?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. System APIs
5. Database Design
6. High Level System Design
7. Detailed Component Design
8. Feed Ranking
9. Data Partitioning
Designing Yelp or Nearby Friends
1. Why Yelp or Proximity Server?
2. Requirements and Goals of the System
3. Scale Estimation
4. Database Schema
5. System APIs
6. Basic System Design and Algorithm
a. SQL solution
b. Grids
c. Dynamic size grids
7. Data Partitioning
8. Replication and Fault Tolerance
9. Cache
10. Load Balancing (LB)
11. Ranking
Designing Uber backend
1. What is Uber?
2. Requirements and Goals of the System
3. Capacity Estimation and Constraints
4. Basic System Design and Algorithm
5. Fault Tolerance and Replication
6. Ranking
7. Advanced Issues
Design Ticketmaster (*New*)
1. What is an online movie ticket booking system?
2. Requirements and Goals of the System
3. Some Design Considerations
4. Capacity Estimation
5. System APIs
6. Database Design
7. High Level Design
8. Detailed Component Design
9. Concurrency
10. Fault Tolerance
11. Data Partitioning
Additional Resources
System Design Basics
Key Characteristics of Distributed Systems
Scalability
Reliability
Availability
Efficiency
Serviceability or Manageability
Load Balancing
Benefits of Load Balancing
Load Balancing Algorithms
Redundant Load Balancers
Caching
Application server cache
Content Distribution Network (CDN)
Cache Invalidation
Cache eviction policies
Sharding or Data Partitioning
1. Partitioning Methods
2. Partitioning Criteria
3. Common Problems of Sharding
Indexes
Example: A library catalog
How do Indexes decrease write performance?
Proxies
Proxy Server Types
Open Proxy
Reverse Proxy
Redundancy and Replication
SQL vs. NoSQL
SQL
NoSQL
High level differences between SQL and NoSQL
SQL VS. NoSQL - Which one to use?
Reasons to use SQL database
Reasons to use NoSQL database
CAP Theorem
Consistent Hashing
What is Consistent Hashing?
How does it work?
Long-Polling vs WebSockets vs Server-Sent Events
Ajax Polling
HTTP Long-Polling
WebSockets
Server-Sent Events (SSEs)
System Design Interviews: A step by step guide A lot of software engineers struggle with system design interviews (SDIs) primarily because of three reasons: • The unstructured nature of SDIs, where they are asked to work on an open-ended design problem that doesn’t have a standard answer. • Their lack of experience in developing large scale systems. • They did not prepare for SDIs. Like coding interviews, candidates who haven’t put a conscious effort to prepare for SDIs, mostly perform poorly especially at top companies like Google, Facebook, Amazon, Microsoft, etc. In these companies, candidates who don’t perform above average, have a limited chance to get an offer. On the other hand, a good performance always results in a better offer (higher position and salary), since it shows the candidate’s ability to handle a complex system. In this course, we’ll follow a step by step approach to solve multiple design problems. First, let’s go through these steps: Step 1: Requirements clarifications It is always a good idea to ask questions about the exact scope of the problem we are solving. Design questions are mostly open-ended, and they don’t have ONE correct answer, that’s why clarifying ambiguities early in the interview becomes critical. Candidates who spend enough time to define the end goals of the system always have a better chance to be successful in the interview. Also, since we only have 35-40 minutes to design a (supposedly) large system, we should clarify what parts of the system we will be focusing on. Let’s expand this with an actual example of designing a Twitter-like service. Here are some questions for designing Twitter that should be answered before moving on to the next steps: • Will users of our service be able to post tweets and follow other people? • Should we also design to create and display the user’s timeline? • Will tweets contain photos and videos? • Are we focusing on the backend only or are we developing the front-end too? • Will users be able to search tweets? • Do we need to display hot trending topics? • Will there be any push notification for new (or important) tweets? All such question will determine how our end design will look like.
Step 2: System interface definition Define what APIs are expected from the system. This will not only establish the exact contract expected from the system, but will also ensure if we haven’t gotten any requirements wrong. Some examples for our Twitter-like service will be: postTweet(user_id, tweet_data, tweet_location, user_location, timestamp, …) generateTimeline(user_id, current_time, user_location, …) markTweetFavorite(user_id, tweet_id, timestamp, …) Step 3: Back-of-the-envelope estimation It is always a good idea to estimate the scale of the system we’re going to design. This will also help later when we will be focusing on scaling, partitioning, load balancing and caching. • What scale is expected from the system (e.g., number of new tweets, number of tweet views, number of timeline generations per sec., etc.)? • How much storage will we need? We will have different numbers if users can have photos and videos in their tweets. • What network bandwidth usage are we expecting? This will be crucial in deciding how we will manage traffic and balance load between servers. Step 4: Defining data model Defining the data model early will clarify how data will flow among different components of the system. Later, it will guide towards data partitioning and management. The candidate should be able to identify various entities of the system, how they will interact with each other, and different aspect of data management like storage, transportation, encryption, etc. Here are some entities for our Twitter- like service: User: UserID, Name, Email, DoB, CreationData, LastLogin, etc. Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc. UserFollowo: UserdID1, UserID2 FavoriteTweets: UserID, TweetID, TimeStamp Which database system should we use? Will NoSQL like Cassandra best fit our needs, or should we use a MySQL-like solution? What kind of block storage should we use to store photos and videos? Step 5: High-level design Draw a block diagram with 5-6 boxes representing the core components of our system. We should identify enough components that are needed to solve the actual problem from end-to-end. For Twitter, at a high-level, we will need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions. If we’re assuming that we will have a lot more read traffic (as compared to write), we can decide to have separate servers for handling these scenarios. On the backend, we need an efficient database that can store all the tweets and can
support a huge number of reads. We will also need a distributed file storage system for storing photos and videos. Step 6: Detailed design Dig deeper into two or three components; interviewer’s feedback should always guide us what parts of the system need further discussion. We should be able to present different approaches, their pros and cons, and explain why we will prefer one approach on the other. Remember there is no single answer, the only important thing is to consider tradeoffs between different options while keeping system constraints in mind. • Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? Should we try to store all the data of a user on the same database? What issue could it cause? • How will we handle hot users who tweet a lot or follow lots of people? • Since users’ timeline will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets? • How much and at which layer should we introduce cache to speed things up? • What components need better load balancing? Step 7: Identifying and resolving bottlenecks Try to discuss as many bottlenecks as possible and different approaches to mitigate them.
Is there any single point of failure in our system? What are we doing to mitigate it? • • Do we have enough replicas of the data so that if we lose a few servers we can still serve our users? • Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown? • How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades? Summary In short, preparation and being organized during the interview are the keys to be successful in system design interviews. The above-mentioned steps should guide you to remain on track and cover all the different aspects while designing a system. Let’s apply the above guidelines to design a few systems that are asked in SDIs. Designing a URL Shortening service like TinyURL Let's design a URL shortening service like TinyURL. This service will provide short aliases redirecting to long URLs. Similar services: bit.ly, goo.gl, qlink.me, etc. Difficulty Level: Easy 1. Why do we need URL shortening? URL shortening is used to create shorter aliases for long URLs. We call these shortened aliases “short links.” Users are redirected to the original URL when they hit these short links. Short links save a lot of space when displayed, printed, messaged, or tweeted. Additionally, users are less likely to mistype shorter URLs. For example, if we shorten this page through TinyURL: https://www.educative.io/collection/page/5668639101419520/5649050225344512/5668600 916475904/ We would get: http://tinyurl.com/jlg8zpc The shortened URL is nearly one-third the size of the actual URL. URL shortening is used for optimizing links across devices, tracking individual links to analyze audience and campaign performance, and hiding affiliated original URLs.
If you haven’t used tinyurl.com before, please try creating a new shortened URL and spend some time going through the various options their service offers. This will help you a lot in understanding this chapter. 2. Requirements and Goals of the System You should always clarify requirements at the beginning of the interview. Be sure to ask questions to find the exact scope of the system that the interviewer has in mind. Our URL shortening system should meet the following requirements: Functional Requirements: 1. Given a URL, our service should generate a shorter and unique alias of it. This is called a short link. 2. When users access a short link, our service should redirect them to the original link. 3. Users should optionally be able to pick a custom short link for their URL. 4. Links will expire after a standard default timespan. Users should be able to specify the expiration time. Non-Functional Requirements: 1. The system should be highly available. This is required because, if our service is down, all the URL redirections will start failing. 2. URL redirection should happen in real-time with minimal latency. 3. Shortened links should not be guessable (not predictable). Extended Requirements: 1. Analytics; e.g., how many times a redirection happened? 2. Our service should also be accessible through REST APIs by other services. 3. Capacity Estimation and Constraints Our system will be read-heavy. There will be lots of redirection requests compared to new URL shortenings. Let’s assume 100:1 ratio between read and write. Traffic estimates: Assuming, we will have 500M new URL shortenings per month, with 100:1 read/write ratio, we can expect 50B redirections during the same period: 100 * 500M => 50B What would be Queries Per Second (QPS) for our system? New URLs shortenings per second: 500 million / (30 days * 24 hours * 3600 seconds) = ~200 URLs/s Considering 100:1 read/write ratio, URLs redirections per second will be: 100 * 200 URLs/s = 20K/s
Storage estimates: Let’s assume we store every URL shortening request (and associated shortened link) for 5 years. Since we expect to have 500M new URLs every month, the total number of objects we expect to store will be 30 billion: 500 million * 5 years * 12 months = 30 billion Let’s assume that each stored object will be approximately 500 bytes (just a ballpark estimate–we will dig into it later). We will need 15TB of total storage: 30 billion * 500 bytes = 15 TB Bandwidth estimates: For write requests, since we expect 200 new URLs every second, total incoming data for our service will be 100KB per second: 200 * 500 bytes = 100 KB/s For read requests, since every second we expect ~20K URLs redirections, total outgoing data for our service would be 10MB per second: 20K * 500 bytes = ~10 MB/s Memory estimates: If we want to cache some of the hot URLs that are frequently accessed, how much memory will we need to store them? If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs. Since we have 20K requests per second, we will be getting 1.7 billion requests per day: 20K * 3600 seconds * 24 hours = ~1.7 billion To cache 20% of these requests, we will need 170GB of memory. 0.2 * 1.7 billion * 500 bytes = ~170GB One thing to note here is that since there will be a lot of duplicate requests (of the same URL), therefore, our actual memory usage will be less than 170GB. High level estimates: Assuming 500 million new URLs per month and 100:1 read:write ratio, following is the summary of the high level estimates for our service: 200/s 20K/s 100KB/s 10MB/s New URLs URL redirections Incoming data Outgoing data Storage for 5 years 15TB Memory for cache 170GB 4. System APIs Once we've finalized the requirements, it's always a good idea to define the system APIs. This should explicitly state what is expected from the system.
We can have SOAP or REST APIs to expose the functionality of our service. Following could be the definitions of the APIs for creating and deleting URLs: createURL(api_dev_key, original_url, custom_alias=None, user_name=None, expire_date=None) Parameters: api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. original_url (string): Original URL to be shortened. custom_alias (string): Optional custom key for the URL. user_name (string): Optional user name to be used in encoding. expire_date (string): Optional expiration date for the shortened URL. Returns: (string) A successful insertion returns the shortened URL; otherwise, it returns an error code. deleteURL(api_dev_key, url_key) Where “url_key” is a string representing the shortened URL to be retrieved. A successful deletion returns ‘URL Removed’. How do we detect and prevent abuse? A malicious user can put us out of business by consuming all URL keys in the current design. To prevent abuse, we can limit users via their api_dev_key. Each api_dev_key can be limited to a certain number of URL creations and redirections per some time period (which may be set to a different duration per developer key). 5. Database Design Defining the DB schema in the early stages of the interview would help to understand the data flow among various components and later would guide towards data partitioning. A few observations about the nature of the data we will store: 1. We need to store billions of records. 2. Each object we store is small (less than 1K). 3. There are no relationships between records—other than storing which user created a URL. 4. Our service is read-heavy. Database Schema: We would need two tables: one for storing information about the URL mappings, and one for the user’s data who created the short link.
URL User Hash: varchar(16) PK UserID: int PK OriginalURL: varchar(512) Name: varchar(20) CreationDate: datetime Email: varchar(32) ExpirationDate: datatime CreationDate: datetime
[Not supported by viewer] LastLogin: datatime What kind of database should we use? Since we anticipate storing billions of rows, and we don’t need to use relationships between objects – a NoSQL key-value store like DynamoDB, Cassandra or Riak is a better choice. A NoSQL choice would also be easier to scale. Please see SQL vs NoSQL for more details. 6. Basic System Design and Algorithm The problem we are solving here is, how to generate a short and unique key for a given URL. In the TinyURL example in Section 1, the shortened URL is “http://tinyurl.com/jlg8zpc”. The last six characters of this URL is the short key we want to generate. We’ll explore two solutions here: a. Encoding actual URL We can compute a unique hash (e.g., MD5 or SHA256, etc.) of the given URL. The hash can then be encoded for displaying. This encoding could be base36 ([a-z ,0-9]) or base62 ([A-Z, a-z, 0-9]) and if we add ‘-’ and ‘.’ we can use base64 encoding. A reasonable question would be, what should be the length of the short key? 6, 8 or 10 characters. Using base64 encoding, a 6 letter long key would result in 64^6 = ~68.7 billion possible strings Using base64 encoding, an 8 letter long key would result in 64^8 = ~281 trillion possible strings With 68.7B unique strings, let’s assume six letter keys would suffice for our system. If we use the MD5 algorithm as our hash function, it’ll produce a 128-bit hash value. After base64 encoding, we’ll get a string having more than 21 characters (since each base64 character encodes 6 bits of the hash value). Since we only have space for 8 characters per short key, how will we choose our key then? We can take the first 6 (or 8) letters for the key. This could result in key duplication though, upon which we can choose some other characters out of the encoding string or swap some characters. What are different issues with our solution? We have the following couple of problems with our encoding scheme: 1. If multiple users enter the same URL, they can get the same shortened URL, which is not acceptable.
分享到:
收藏