The data was downloaded from Kaggle:https://www.kaggle.com/prasertk/netflix-subscription-price-in-different-countries
The dataset contains details on Netflix subsription fees and content offerings across several countries where Netflix services are available.
Note: The dataset does not include ALL countries where Netflix operates.
Determining where Netflix customers potentially experience the most satisfaction from their subscription.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#Read the file
netflix_df = pd.read_csv('netflix dec 2021.csv')
netflix_df.head()
Country_code | Country | Total Library Size | No. of TV Shows | No. of Movies | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | |
---|---|---|---|---|---|---|---|---|
0 | ar | Argentina | 4760 | 3154 | 1606 | 3.74 | 6.30 | 9.26 |
1 | au | Australia | 6114 | 4050 | 2064 | 7.84 | 12.12 | 16.39 |
2 | at | Austria | 5640 | 3779 | 1861 | 9.03 | 14.67 | 20.32 |
3 | be | Belgium | 4990 | 3374 | 1616 | 10.16 | 15.24 | 20.32 |
4 | bo | Bolivia | 4991 | 3155 | 1836 | 7.99 | 10.99 | 13.99 |
netflix_df.columns
Index(['Country_code', 'Country', 'Total Library Size', 'No. of TV Shows',
'No. of Movies', 'Cost Per Month - Basic ($)',
'Cost Per Month - Standard ($)', 'Cost Per Month - Premium ($)'],
dtype='object')
#Checking the entire data frame for missing values
netflix_df.isna().any().any()
False
In this stage we are adding new features that will provide more insight into the data.
We will identify the region each country falls under based on the country code.
import pycountry_convert as pc
#Creating a function to identify the continent from the country code
def country_to_continent(country_name):
country_alpha2 = pc.country_name_to_country_alpha2(country_name)
country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
return country_continent_name
#Creating a column to indicate the continent for each country
netflix_df['Region'] = netflix_df['Country'].apply(lambda country_name: country_to_continent(country_name))
netflix_df.head()
Country_code | Country | Total Library Size | No. of TV Shows | No. of Movies | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | Region | |
---|---|---|---|---|---|---|---|---|---|
0 | ar | Argentina | 4760 | 3154 | 1606 | 3.74 | 6.30 | 9.26 | South America |
1 | au | Australia | 6114 | 4050 | 2064 | 7.84 | 12.12 | 16.39 | Oceania |
2 | at | Austria | 5640 | 3779 | 1861 | 9.03 | 14.67 | 20.32 | Europe |
3 | be | Belgium | 4990 | 3374 | 1616 | 10.16 | 15.24 | 20.32 | Europe |
4 | bo | Bolivia | 4991 | 3155 | 1836 | 7.99 | 10.99 | 13.99 | South America |
To make the Dataframe more readable we can move the “Region” column right after the “Country” column
#Creating a function to reposition columns
def movecol(df, cols_to_move=[], ref_col='', place='After'):
#cols_to_move can be one or more columns we wish to move
#ref_col is the column after or before which we want to reposition the column(s)
cols = df.columns.tolist()
if place == 'After':
seg1 = cols[:list(cols).index(ref_col) + 1]
seg2 = cols_to_move
if place == 'Before':
seg1 = cols[:list(cols).index(ref_col)]
seg2 = cols_to_move + [ref_col]
seg1 = [i for i in seg1 if i not in seg2]
seg3 = [i for i in cols if i not in seg1 + seg2]
return(df[seg1 + seg2 + seg3])
netflix_df = movecol(netflix_df, cols_to_move=['Region'], ref_col='Country', place='After')
We will examine the relationship between the amount of content available and the basic subscription cost to determine which customers receive the most value for money.
A higher score indicates that the customers can access more content per dollar spent on their Basic Netflix Subsciption.
netflix_df['Value for Money'] = (netflix_df['Total Library Size']/netflix_df['Cost Per Month - Basic ($)']).round()
In this stage, we will examine the data to identify any patterns, trends and relationships between the variables. It will help us analyze the data and extract insights that can be used to make decisions.
Data Visualization will give us a clear idea of what the data means by giving it visual context.
#Number of countries we have data for
len(netflix_df['Country'])
65
netflix_df['Region'].value_counts().to_frame()
Region | |
---|---|
Europe | 34 |
Asia | 12 |
South America | 10 |
North America | 6 |
Oceania | 2 |
Africa | 1 |
regions = netflix_df['Region'].value_counts()
regions.plot.pie(figsize=(8, 8), autopct='%1.0f%%',fontsize=15,labels=regions.index, pctdistance=0.85, startangle=55)
plt.title("Netflix Worldwide Coverage by Region", fontsize=16)
# autopct="%1.0f%%" shows the percentage to 0 decimal place
# pctdistance controls how far the values appear from the center of the circle
# startangle controls the angle of the slices, allowing the pie chart to be rotated to improve readability
Text(0.5, 1.0, 'Netflix Worldwide Coverage by Region')
We have data on 65 countries where Netflix operates across 6 regions.
content_stats = netflix_df['Total Library Size'].describe().round().to_frame()
tv = netflix_df['No. of TV Shows'].describe().round()
movies = netflix_df['No. of Movies'].describe().round()
content_stats = pd.concat([content_stats, tv, movies], axis=1)
content_stats
Total Library Size | No. of TV Shows | No. of Movies | |
---|---|---|---|
count | 65.0 | 65.0 | 65.0 |
mean | 5314.0 | 3519.0 | 1795.0 |
std | 980.0 | 723.0 | 327.0 |
min | 2274.0 | 1675.0 | 373.0 |
25% | 4948.0 | 3154.0 | 1628.0 |
50% | 5195.0 | 3512.0 | 1841.0 |
75% | 5952.0 | 3832.0 | 1980.0 |
max | 7325.0 | 5234.0 | 2387.0 |
fig_dims = (10,10)
fig, ax = plt.subplots(figsize=fig_dims)
sns.histplot(data=netflix_df, x="Total Library Size",kde=True,binwidth=200)
plt.title("Total Netflix Content Library Size", fontsize=15)
Text(0.5, 1.0, 'Total Netflix Content Library Size')
netflix_content_stats_df = pd.DataFrame(columns = ['Description', 'Country', 'Total Library Size', 'No. of TV Shows', 'No. of Movies','Cost Per Month - Basic ($)', 'Cost Per Month - Standard ($)','Cost Per Month - Premium ($)'])
#Largest Library
max_lib = netflix_df[netflix_df['Total Library Size'] == netflix_df['Total Library Size'].max()]
max_lib.insert(0, "Description", "Largest Content Library", True)
#Smallest Library
min_lib = netflix_df[netflix_df['Total Library Size'] == netflix_df['Total Library Size'].min()]
min_lib.insert(0, "Description", "Smallest Content Library", True)
#Largest TV Library
max_tv = netflix_df[netflix_df['No. of TV Shows'] == netflix_df['No. of TV Shows'].max()]
max_tv.insert(0, "Description", "Largest Library of TV Shows", True)
#Smallest TV Library
min_tv = netflix_df[netflix_df['No. of TV Shows'] == netflix_df['No. of TV Shows'].min()]
min_tv.insert(0, "Description", "Smallest Library of TV Shows", True)
#Largest Movie Library
max_mov = netflix_df[netflix_df['No. of Movies'] == netflix_df['No. of Movies'].max()]
max_mov.insert(0, "Description", "Largest Library of Movies", True)
#Smallest Movie Library
min_mov = netflix_df[netflix_df['No. of Movies'] == netflix_df['No. of Movies'].min()]
min_mov.insert(0, "Description", "Smallest Library of Movies", True)
#Adding columns to empty dataframe
netflix_content_stats_df = pd.concat([netflix_content_stats_df, max_lib, min_lib, max_tv, min_tv, max_mov, min_mov], axis=0)
#Removing extra columns
netflix_content_stats_df = netflix_content_stats_df[['Description', 'Country', 'Total Library Size', 'No. of TV Shows', 'No. of Movies']]
netflix_content_stats_df
Description | Country | Total Library Size | No. of TV Shows | No. of Movies | |
---|---|---|---|---|---|
12 | Largest Content Library | Czechia | 7325 | 5234 | 2091 |
11 | Smallest Content Library | Croatia | 2274 | 1675 | 599 |
12 | Largest Library of TV Shows | Czechia | 7325 | 5234 | 2091 |
11 | Smallest Library of TV Shows | Croatia | 2274 | 1675 | 599 |
35 | Largest Library of Movies | Malaysia | 5952 | 3565 | 2387 |
49 | Smallest Library of Movies | San Marino | 2310 | 1937 | 373 |
As seen above,
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the Top 10 countires by library content
top_lib = netflix_df.sort_values(by=['Total Library Size'], ascending=False).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=top_lib['Total Library Size'], y=top_lib['Country'], data=top_lib, label="All Content", color="b")
plt.title("Top Countries by Netflix Library Size", fontsize=15)
sns.set_color_codes("muted")
sns.barplot(x=top_lib['No. of TV Shows'], y=top_lib['Country'], data=top_lib, label="TV Shows", color="b")
sns.set_color_codes("pastel")
sns.barplot(x=top_lib['No. of Movies'], y=top_lib['Country'], data=top_lib, label="Movies", color="darkblue")
# Add a legend and informative axis label
ax.set(xlim=(0, 8000), ylabel="", xlabel="Library Size")
sns.despine(left=True, bottom=True)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x1e78187c8e0>
In the top 10 countries by library size we see that the Movie content remains around 2000 and the main differentiator is the number of TV Shows available in each country.
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the Bottom 10 countires by library content
bot_lib = netflix_df.sort_values(by=['Total Library Size'], ascending=True).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=bot_lib['Total Library Size'], y=bot_lib['Country'], data=bot_lib, label="All Content", color="b")
plt.title("Top Countries by Smallest Netflix Library Size", fontsize=15)
sns.set_color_codes("muted")
sns.barplot(x=bot_lib['No. of TV Shows'], y=bot_lib['Country'], data=bot_lib, label="TV Shows", color="b")
sns.set_color_codes("pastel")
sns.barplot(x=bot_lib['No. of Movies'], y=bot_lib['Country'], data=bot_lib, label="Movies", color="darkblue")
# Add a legend and informative axis label
ax.set(xlim=(0, 5000), ylabel="", xlabel="Library Size")
sns.despine(left=True, bottom=True)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x1e781bdac40>
In the 10 countries by smallest library size we see that the Movie content remains around 1500 (except for Croatia and San Marino) and once again the main differentiator is the number of TV Shows available in each country.
subcription_stats = netflix_df['Cost Per Month - Basic ($)'].describe().round(2).to_frame()
standard = netflix_df['Cost Per Month - Standard ($)'].describe().round(2)
premium = netflix_df['Cost Per Month - Premium ($)'].describe().round(2)
subcription_stats = pd.concat([subcription_stats, standard, premium], axis=1)
subcription_stats
Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | |
---|---|---|---|
count | 65.00 | 65.00 | 65.00 |
mean | 8.37 | 11.99 | 15.61 |
std | 1.94 | 2.86 | 4.04 |
min | 1.97 | 3.00 | 4.02 |
25% | 7.99 | 10.71 | 13.54 |
50% | 8.99 | 11.49 | 14.45 |
75% | 9.03 | 13.54 | 18.06 |
max | 12.88 | 20.46 | 26.96 |
The cost of a customer’s Netflix subscription tier increases by as much as $22.94 if they relocate to a different country.
fig_dims = (8,8)
fig, ax = plt.subplots(figsize=fig_dims)
sns.histplot(data=netflix_df, x="Cost Per Month - Basic ($)",kde=True,binwidth=0.5)
plt.title("Cost Per Month - Basic ($)", fontsize=15)
Text(0.5, 1.0, 'Cost Per Month - Basic ($)')
fig_dims = (8,8)
fig, ax = plt.subplots(figsize=fig_dims)
sns.histplot(data=netflix_df, x="Cost Per Month - Standard ($)",kde=True,binwidth=1)
plt.title("Cost Per Month - Standard ($)", fontsize=15)
Text(0.5, 1.0, 'Cost Per Month - Standard ($)')
fig_dims = (8,8)
fig, ax = plt.subplots(figsize=fig_dims)
sns.histplot(data=netflix_df, x="Cost Per Month - Premium ($)",kde=True,binwidth=1)
plt.title("Cost Per Month - Premium ($)", fontsize=15)
Text(0.5, 1.0, 'Cost Per Month - Premium ($)')
netflix_subscription_stats_df = pd.DataFrame(columns = ['Description', 'Country', 'Total Library Size', 'No. of TV Shows', 'No. of Movies','Cost Per Month - Basic ($)', 'Cost Per Month - Standard ($)','Cost Per Month - Premium ($)'])
#Highest basic cost per month
max_basic = netflix_df[netflix_df['Cost Per Month - Basic ($)'] == netflix_df['Cost Per Month - Basic ($)'].max()]
max_basic.insert(0, "Description", "Highest Subsription Costs", True)
#Smallest basic cost per month
min_basic = netflix_df[netflix_df['Cost Per Month - Basic ($)'] == netflix_df['Cost Per Month - Basic ($)'].min()]
min_basic.insert(0, "Description", "Lowest Subsription Costs", True)
#Highest standard cost per month
max_standard = netflix_df[netflix_df['Cost Per Month - Standard ($)'] == netflix_df['Cost Per Month - Standard ($)'].max()]
max_standard.insert(0, "Description", "Highest Standard Subsription", True)
#Smallest standard cost per month
min_standard = netflix_df[netflix_df['Cost Per Month - Standard ($)'] == netflix_df['Cost Per Month - Standard ($)'].min()]
min_standard.insert(0, "Description", "Lowest Standard Subsription", True)
#Highest premium cost per month
max_premium = netflix_df[netflix_df['Cost Per Month - Premium ($)'] == netflix_df['Cost Per Month - Premium ($)'].max()]
max_premium.insert(0, "Description", "Highest Premium Subsription", True)
#Smallest premium cost per month
min_premium = netflix_df[netflix_df['Cost Per Month - Premium ($)'] == netflix_df['Cost Per Month - Premium ($)'].min()]
min_premium.insert(0, "Description", "Lowest Premium Subsription", True)
#Adding columns to empty dataframe
netflix_subscription_stats_df = pd.concat([netflix_subscription_stats_df, max_basic, min_basic, max_standard, min_standard, max_premium, min_premium], axis=0)
#Removing extra columns
netflix_subscription_stats_df = netflix_subscription_stats_df[['Description', 'Country', 'Cost Per Month - Basic ($)', 'Cost Per Month - Standard ($)','Cost Per Month - Premium ($)']]
netflix_subscription_stats_df
Description | Country | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | |
---|---|---|---|---|---|
33 | Highest Subsription Costs | Liechtenstein | 12.88 | 20.46 | 26.96 |
56 | Highest Subsription Costs | Switzerland | 12.88 | 20.46 | 26.96 |
59 | Lowest Subsription Costs | Turkey | 1.97 | 3.00 | 4.02 |
33 | Highest Standard Subsription | Liechtenstein | 12.88 | 20.46 | 26.96 |
56 | Highest Standard Subsription | Switzerland | 12.88 | 20.46 | 26.96 |
59 | Lowest Standard Subsription | Turkey | 1.97 | 3.00 | 4.02 |
33 | Highest Premium Subsription | Liechtenstein | 12.88 | 20.46 | 26.96 |
56 | Highest Premium Subsription | Switzerland | 12.88 | 20.46 | 26.96 |
59 | Lowest Premium Subsription | Turkey | 1.97 | 3.00 | 4.02 |
As seen above,
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the Top 10 countires by Subscription Cost
top_cost = netflix_df.sort_values(by=['Cost Per Month - Premium ($)'], ascending=False).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=top_cost['Cost Per Month - Premium ($)'], y=top_cost['Country'], data=top_lib, label="Premium", color="b")
plt.title("Top Countries by Netflix Subscription Cost", fontsize=15)
sns.set_color_codes("muted")
sns.barplot(x=top_cost['Cost Per Month - Standard ($)'], y=top_cost['Country'], data=top_lib, label="Standard", color="b")
sns.set_color_codes("pastel")
sns.barplot(x=top_cost['Cost Per Month - Basic ($)'], y=top_cost['Country'], data=top_lib, label="Basic", color="darkblue")
# Add a legend and informative axis label
ax.set(xlim=(0, 30), ylabel="", xlabel="Subscription Cost ($)")
sns.despine(left=True, bottom=True)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x1e782008880>
In the Top 10 countries by Subscription Cost we see that the subscription cost remains almost the same across tiers (except for Switzerland and Liechtenstein)
netflix_region_df = netflix_df.groupby('Region')['Total Library Size'].mean().round()
#Converting the series netflix_region_df to a dataframe
netflix_region_df = netflix_region_df.to_frame()
#Identifying average number of TV shows offered in each region
netflix_region_df['Average No. of TV Shows'] = netflix_df.groupby('Region')['No. of TV Shows'].mean().round()
#Identifying average number of Movies offered in each region
netflix_region_df['Average No. of Movies'] = netflix_df.groupby('Region')['No. of Movies'].mean().round()
#Identifying average basic subscription cost in each region
netflix_region_df['Average Basic Subscription Cost ($)'] = netflix_df.groupby('Region')['Cost Per Month - Basic ($)'].mean().round(2)
#Identifying average standard subscription cost in each region
netflix_region_df['Average Standard Subscription Cost ($)'] = netflix_df.groupby('Region')['Cost Per Month - Standard ($)'].mean().round(2)
#Identifying average premium subscription cost in each region
netflix_region_df['Average Premium Subscription Cost ($)'] = netflix_df.groupby('Region')['Cost Per Month - Premium ($)'].mean().round(2)
#Identifying average value for money in each region
netflix_region_df['Average Value for Money'] = netflix_df.groupby('Region')['Value for Money'].mean().round()
#Renaming columns
netflix_region_df.rename(columns={'Total Library Size':'Average Overall Library Size'}, inplace=True)
#The index for this dataframe contains the station names so we want to put that information in a seaprate column
#Resetting the index (also creates a new column called 'index')
netflix_region_df .reset_index(level=0, inplace=True)
netflix_region_df
Region | Average Overall Library Size | Average No. of TV Shows | Average No. of Movies | Average Basic Subscription Cost ($) | Average Standard Subscription Cost ($) | Average Premium Subscription Cost ($) | Average Value for Money | |
---|---|---|---|---|---|---|---|---|
0 | Africa | 5736.0 | 3686.0 | 2050.0 | 6.26 | 10.05 | 12.58 | 916.0 |
1 | Asia | 5347.0 | 3377.0 | 1970.0 | 7.64 | 10.40 | 12.97 | 900.0 |
2 | Europe | 5361.0 | 3652.0 | 1709.0 | 9.23 | 13.30 | 17.55 | 599.0 |
3 | North America | 5299.0 | 3459.0 | 1840.0 | 8.08 | 11.88 | 15.20 | 661.0 |
4 | Oceania | 6099.0 | 4026.0 | 2072.0 | 8.32 | 12.32 | 16.66 | 736.0 |
5 | South America | 4927.0 | 3156.0 | 1771.0 | 6.71 | 9.62 | 12.56 | 802.0 |
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the Top 10 countires by library content
region_lib = netflix_region_df.sort_values(by=['Average Overall Library Size'], ascending=False).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=region_lib['Average Overall Library Size'], y=region_lib['Region'], label="All Content", color="b")
plt.title("Regions by Netflix Library Size", fontsize=15)
sns.set_color_codes("muted")
sns.barplot(x=region_lib['Average No. of TV Shows'], y=region_lib['Region'], label="TV Shows", color="b")
sns.set_color_codes("pastel")
sns.barplot(x=region_lib['Average No. of Movies'], y=region_lib['Region'], label="Movies", color="darkblue")
# Add a legend and informative axis label
ax.set(xlim=(0, 8000), ylabel="", xlabel="Library Size")
sns.despine(left=True, bottom=True)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x1e78225f7c0>
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the countires by Subscription Cost
region_cost = netflix_region_df.sort_values(by=['Average Premium Subscription Cost ($)'], ascending=False)
sns.set_color_codes("pastel")
sns.barplot(x=region_cost['Average Premium Subscription Cost ($)'], y=region_cost['Region'], label="Premium", color="b")
plt.title("Regions by Netflix Subscription Cost", fontsize=15)
sns.set_color_codes("muted")
sns.barplot(x=region_cost['Average Standard Subscription Cost ($)'], y=region_cost['Region'], label="Standard", color="b")
sns.set_color_codes("pastel")
sns.barplot(x=region_cost['Average Basic Subscription Cost ($)'], y=region_cost['Region'], label="Basic", color="darkblue")
# Add a legend and informative axis label
ax.set(xlim=(0, 20), ylabel="", xlabel="Subscription Cost ($)")
sns.despine(left=True, bottom=True)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x1e7823130a0>
netflix_df['Value for Money'].describe().round().to_frame()
Value for Money | |
---|---|
count | 65.0 |
mean | 700.0 |
std | 342.0 |
min | 237.0 |
25% | 555.0 |
50% | 628.0 |
75% | 753.0 |
max | 2355.0 |
netflix_df[(netflix_df['Value for Money']==237)|(netflix_df['Value for Money']==2355)]
Country_code | Country | Region | Total Library Size | No. of TV Shows | No. of Movies | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | Value for Money | |
---|---|---|---|---|---|---|---|---|---|---|
33 | li | Liechtenstein | Europe | 3048 | 1712 | 1336 | 12.88 | 20.46 | 26.96 | 237.0 |
59 | tr | Turkey | Asia | 4639 | 2930 | 1709 | 1.97 | 3.00 | 4.02 | 2355.0 |
The average Netflix customer has access to 700 TV Shows and Movies for every dollar spent on their subscription.
Therefore, a cutomer in Turkey country is getting 10 times the value from their Netflix subscription compared to a customer in Liechtenstein
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the countires by Subscription Cost
value = netflix_df.sort_values(by=['Value for Money'], ascending=False).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=value['Value for Money'], y=value['Country'], color="b")
plt.title("Top Countries by Customer Value for Money", fontsize=15)
Text(0.5, 1.0, 'Top Countries by Customer Value for Money')
Customers in Turkey and India have the greatest value for money for their Basic Netflix Subscription
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the countires by Subscription Cost
value = netflix_df.sort_values(by=['Value for Money'], ascending=True).head(10)
sns.set_color_codes("pastel")
sns.barplot(x=value['Value for Money'], y=value['Country'], color="b")
plt.title("Top Countries by Lowest Customer Value for Money", fontsize=15)
Text(0.5, 1.0, 'Top Countries by Lowest Customer Value for Money')
Customers in Liechtenstein, Croatia and San Marino have the least value for money for their Basic Netflix Subscription
sns.set_theme(style="whitegrid")
fig_dims = (10, 6)
fig, ax = plt.subplots(figsize=fig_dims)
#Creating a DataFrame with the countires by Subscription Cost
value_region = netflix_region_df.sort_values(by=['Average Value for Money'], ascending=False)
sns.set_color_codes("pastel")
sns.barplot(x=value_region['Average Value for Money'], y=value_region['Region'], color="b")
plt.title("Customer Value for Money by Region", fontsize=15)
Text(0.5, 1.0, 'Customer Value for Money by Region')
Another way to determine the customer’s value for money is to look at countries where customers get access to above average library size and below average basic subsciption cost.
#Identifying countries with below average basic cost and above average library size
highlight = netflix_df[(netflix_df['Cost Per Month - Basic ($)']<netflix_df['Cost Per Month - Basic ($)'].mean())&(netflix_df['Total Library Size']>netflix_df['Total Library Size'].mean())]
len(highlight)
10
We have 10 countries where customers get access to above average library size and below average basic subsciption cost.
fig_dims = (15, 5)
fig, ax = plt.subplots(figsize=fig_dims)
sns.scatterplot(x='Total Library Size',y='Cost Per Month - Basic ($)', data=netflix_df, s=50, color="lightblue")
sns.scatterplot(x='Total Library Size',y='Cost Per Month - Basic ($)', data=highlight, s=50, color="green")
plt.title("Content vs. Subsciption Cost",fontsize=15)
Text(0.5, 1.0, 'Content vs. Subsciption Cost')
Points highlighted in green indicate countries with access to an above average library size at a below average basic subscription cost.
#Identifying the top 3 countries by Total Library Size
highlight.sort_values(by=['Total Library Size'], ascending=True).head(3)
Country_code | Country | Region | Total Library Size | No. of TV Shows | No. of Movies | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | Value for Money | |
---|---|---|---|---|---|---|---|---|---|---|
60 | ua | Ukraine | Europe | 5336 | 3261 | 2075 | 5.64 | 8.46 | 11.29 | 946.0 |
48 | ru | Russia | Europe | 5711 | 3624 | 2087 | 8.13 | 10.84 | 13.56 | 702.0 |
52 | za | South Africa | Africa | 5736 | 3686 | 2050 | 6.26 | 10.05 | 12.58 | 916.0 |
#Identifying the top 3 countries by Basic Subscription Cost
highlight.sort_values(by=['Cost Per Month - Basic ($)'], ascending=True).head(3)
Country_code | Country | Region | Total Library Size | No. of TV Shows | No. of Movies | Cost Per Month - Basic ($) | Cost Per Month - Standard ($) | Cost Per Month - Premium ($) | Value for Money | |
---|---|---|---|---|---|---|---|---|---|---|
26 | in | India | Asia | 5843 | 3718 | 2125 | 2.64 | 6.61 | 8.60 | 2213.0 |
60 | ua | Ukraine | Europe | 5336 | 3261 | 2075 | 5.64 | 8.46 | 11.29 | 946.0 |
52 | za | South Africa | Africa | 5736 | 3686 | 2050 | 6.26 | 10.05 | 12.58 | 916.0 |
In terms of Content, the top 3 countries for customers are Ukraine, Russia and South Africa
In terms of Basic Subscription Cost, the top 3 countries for customers are India, Ukraine and South Africa
Taking both Content and Subscription Cost into consideration Ukrainian and South African Customers receive the most value for their Netflix Subscription.
We have data on 65 countries where Netflix operates across 6 regions.
In terms of number of countries served,
Based on the data we can say that the majority of Netflix’s content in any given country consists of TV Shows.
The cost of a customer’s Netflix subscription tier increases by as much as $22.94 if they relocate to a different country.
The average Netflix customer has access to 700 TV Shows and Movies for every dollar spent on their subscription.
Thereofore, the disparity between the value a customer experiences from country to country may be almost as high as 10 times the value for money
Content: European countries on average have access to the 3rd largest content library, only slightly bigger than Asian countries.
Subscription Cost and Value for Money: Customers experience the least value for money for their Basic Netflix Subscription despite the fact that it has the highest overall subscription cost
Country Details:
Content: Asian countries on average have access to the 4rd largest content library
Subscription Cost and Value for Money: Customers experience the 2nd highest value for money for their Basic Netflix Subscription and have the 4th highest overall subscription cost
Country Details:
Content: African countries on average have access to the 2nd largest content library
Subscription Cost and Value for Money: Customers experience the greatest value for money for their Basic Netflix Subscription and it has the lowest basic subscription cost
Content: North American countries on average have access to the 2nd smallest overall content library
Subscription Cost and Value for Money:Customers experience the 2nd lowest value for money for their Basic Netflix Subscription and it has the 3rd highest overall subscription cost.
Content: South American Countries on average have access to the smallest overall content library
Subscription Cost and Value for Money: Customers experience the 3rd highest value for money for their Basic Netflix Subscription and it has the lowest overall subscription cost.
Content: Oceanian countries on average have access to the largest overall content library
Subscription Cost and Value for Money: Customers experience the 4th highest value for money for their Basic Netflix Subscription and it has the 2nd highest overall subscription cost.
A Netflix customer’s primary concern is determining a balance between the value of entertainment and the cost of that entertainment. Ideally, the customer is seeking to pay the lowest amount possible for access to the most content. Based on the information above we can see that Cutomers in Africa would experience the most satisfaction for three key reasons:
Content: Access to the 2nd largest content library
Subscription Cost: It has the lowest basic subscription cost
Value for Money: Region with the greatest value for money
In particular, South Africa, which ranks 3rd in terms of above average access to content and below average basic subscription cost is the best location for Netflix Customers.
Additional Data necessary
The data only tells how much content is available to customers in each country and how much the different subscription tiers cost. For a holistic comparison, we need to consider factors such as
Together this data can help paint a detailed picture of what customers enjoy the most and where the most satisfied customers may be located.