Archive for the ‘amazon’ Category

Nginx-Fu: X-Accel-Redirect From Remote Servers

Июнь 24th, 2010

We use nginx and its features a lot in Scribd. Many times in the last year we needed some pretty interesting, but not supported feature – we wanted nginx X-Accel-Redirect functionality to work with remote URLs. Our of the box nginx supports this functionality for local URIs only. In this short post I want to explain how did we make nginx serve remote content via X-Accel-Redirect.

First of all, here is what you may need this feature. Let’s imagine you have a file storage on Amazon S3 where you store tons of content. And you have an application where you have some content downloading functionality that you want to be available for logged-in/paying/premium users and/or you want to keep track of downloads your users perform on your site. If your content was on your web server, you could have used simple controlled downloads functionality built-in to nginx out of the box. But the problem is that your content is remote.

Here is what we do to solve this problem.

First, we create a special location on our nginx server. This location will be used as a proxy for all our accelerated file downloads:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Proxy download
location ~* ^/internal_redirect/(.*?)/(.*) {
    # Do not allow people to mess with this location directly
    # Only internal redirects are allowed
    internal;

    # Location-specific logging
    access_log logs/internal_redirect.access.log main;
    error_log logs/internal_redirect.error.log warn;

    # Extract download url from the request
    set $download_uri $2;
    set $download_host $1;

    # Compose download url
    set $download_url http://$download_host/$download_uri;

    # Set download request headers
    proxy_set_header Host $download_host;
    proxy_set_header Authorization '';

    # The next two lines could be used if your storage
    # backend does not support Content-Disposition
    # headers used to specify file name browsers use
    # when save content to the disk
    proxy_hide_header Content-Disposition;
    add_header Content-Disposition 'attachment; filename="$args"';

    # Do not touch local disks when proxying
    # content to clients
    proxy_max_temp_file_size 0;

    # Download the file and send it to client
    proxy_pass $download_url;
}

After adding this location to our nginx config we could start sending responses with headers like the following:

1
2
3
4
5
6
7
# This header will ask nginx to download a file
# from http://some.site.com/secret/url.ext and send it to user
X-Accel-Redirect: /internal_redirect/some.site.com/secret/url.ext

# This header will ask nginx to download a file
# from http://blah.com/secret/url and send it to user as cool.pdf
X-Accel-Redirect: /internal_redirect/blah.com/secret/url?cool.pdf

Here is an example code you could use in a Rails application to use our internal redirect location:

1
2
3
4
5
6
7
8
9
10
def x_accel_url(url, file_name = nil)
  uri = "/internal_redirect/#{url.gsub('http://', '')}"
  uri << "?#{file_name}" if file_name
  return uri
end

def download
  headers['X-Accel-Redirect'] = x_accel_url(some_secret_url, pretty_name)
  render :nothing => true
end

As you can see, nginx is really powerful tool and when you turn your creativity on you can make it even more powerful. Stay tuned for more Nginx-Fu posts.



PlanetMySQL Voting: Vote UP / Vote DOWN

Amazon now accepts hard drives for EC2 data transfer

Июнь 13th, 2010

I guess they got tired of people sending angry emails about data transfer fees:

“Amazon provides an online calculator to help customers decide whether it makes financial sense to ship data via mail rather than uploading over the Internet. You plug in the number of terabytes, devices, average file size, return shipping information and other factors, and find out how much the data transfer would cost via mail compared to standard Internet uploads.

For example, transferring data from a single device containing 2TB would require 26 hours of data loading time and cost $144.74. Uploading the same amount of data over the Internet would cost $204.80. The calculator does not show how long the Internet transfer would take.”

http://www.networkworld.com/news/2010/061010-amazon-cloud-fedex.html


PlanetMySQL Voting: Vote UP / Vote DOWN

Piper Jaffray on the Cloud

Март 16th, 2010

Piper Jaffray has published a 300+ page study on the cloud computing industry based on a recent survey undertaken of 100 CIOs. Bottom line, cloud computing is expected to grow significantly over the next five years. 

    Survey respondents expect the mix of cloud computing to escalate strongly to 13.5% in five years. This equates to a five-year CAGR of 19.2%, or 23.9% when we also incorporate IDC’s forecast that total software budgets will grow 4.7% annually. In other words, software spending will grow gradually in the next five years, but the mix of spend allocated to cloud-based applications will likely surge rapidly. Another way to think about the data is that the Cloud Computing market is expected to grow five times as fast as the broader software market: 23.9% vs. 4.7%.

If anything, I think the prediction is conservative and the impact could be much larger in magnitude when mainstream adoption occurs.  But the risk is that adoption takes longer, just as it did for open source software.  And as the report indicates, open source is powering much of the cloud computing that's going on:

    The next-generation Cloud Computing data centers are NOT running Microsoft Windows; they are increasingly leveraging the compelling economics of open source components. For example, the data centers powering Amazon, Google, and salesforce.com all run on Linux and other open source technologies. In fact, Red Hat’s operating system and the MySQL database are key components to many of the leading-edge Clouds being developed today. 

Why is this occurring? Because open source leverages a global community development process which results in a product that evolves rapidly, provides transparency into the source code dynamics, and surpasses other products in terms of security and reliability – all at a lower total cost of ownership (TCO) than traditional offerings.


PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle/Sun vs. The Cloud

Декабрь 22nd, 2009
Larry Ellison makes it very clear that Oracle believes in a back to the future model where software and hardware meld together into “systems”, purpose-built, integrated solutions. In other words you won’t buy an Oracle database and a server and configure it to run a data warehouse, instead you’ll buy the “Oracle Data Warehouse Server.” The first such system is Exadata, which is apparently doing quite well, according to Ellison.

This is a classic bundling, although some may call it a tying strategy. Microsoft, seeing that they couldn’t win each office productivity segment individually—including word processing, spreadsheet and presentations—decided to play to their strength and bundle them into a solution that no individual company could compete with. This is bundling. The tying strategy is where Microsoft used their dominance in the operating system to tie the browser to the OS, thereby owning the browser market. In the case of Oracle, one could make a case either bundling or tying. I’m making neither a value, nor a legal judgment about Oracle’s strategy; I am just providing historical context.

Ellison points to Cisco and IBM, under T.J. Watson Jr., as examples of successful systems companies. But my question is simple: Will this back to the future strategy work against the cloud? Assembling solutions with pre-packaged systems is certainly easier than starting with more granular components like hardware and software. But does it really stack up against today’s benchmark, the cloud.

Let me use a transportation analogy:

Assembling all of the components (hardware, software, etc.): Like building a car piece by piece

Assembling systems (a la Oracle's Exadata and Cisco): Like building a car by installing large grain items, the chassis, wheels, engine, etc.

Using the cloud: Like buying a pre-built car off the lot

SaaS Applications: Like riding the subway

Most people are perfectly happy either buying a car or riding the subway. For really high-end performance, some may want to build their own car with components or by hand, but it’s a relatively small market.

I don’t expect any public cloud offerings to satisfy high-end enterprise demands…yet. But I have to admit, the cloud is evolving quite rapidly. Just look at Amazon and their introduction of Virtual Private Clouds, Elastic Block Services (a SAN in the sky), Boot from EBS, etc. I can launch an entire cluster with a mouse-click, without talking to IT. How can you beat that? Historical precedence is also on the side of commodity technologies, like the cloud, growing up to cannibalize the high-end. The PC cannibalized the workstation, which cannibalized the mini, which cannibalized the mainframe. From the clou's perspective, the trend is their friend.

The cloud won’t seriously threaten large enterprise systems for quite some time, but I believe it is just a matter of time. Oracle can certainly ride a strong wave of current demand for systems. I expect that in time they will also provide a compelling suite of solutions in the cloud. But if I were a bettin’ man I’d have to bet on the cloud; they have simplicity and history on their side. On the other hand, it is hard to bet against Ellison.

PlanetMySQL Voting: Vote UP / Vote DOWN

Comparing Cloud Databases: SimpleDB, RDS and ScaleDB

Октябрь 30th, 2009

Amazon’s SimpleDB isn’t a relational database, but it does provide elastic scalability and high-availability. Amazon’s recently announced Relational Database Services (RDS) is a relational database, but it doesn’t provide elastic scalability or high-availability. If you are deploying enterprise applications on the cloud (including Amazon Web Services), you might want to look at ScaleDB because it is a relational database and it does provide elastic scalability and high-availability.

Amazon describes SimpleDB by comparing it to a clustered database:

"A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires extensive and repetitive database administration. Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access. This approach eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use."

In other words, if there was a clustered database that was cost-efficient, simple, low-maintenance, and provided dynamic elasticity, that would be ideal. That is exactly what ScaleDB provides. Granted it isn’t as simple to use as SimpleDB (just look at the name, one is simple, the other is scale) but it does eliminate data partitioning and slaves/replication, both of which account for the bulk of the pain in clustering. ScaleDB also runs MySQL applications without modification.

Amazon, in a nod to SQL developers and MySQL applications, released Relational Database Services (RDS) this week. This too comes up short of Amazon’s ideal of a dynamically scalable and highly available MySQL database. Again, that is exactly what ScaleDB provides.

Comparing SimpleDB, RDS and ScaleDB

Function

SimpleDB

RDS

ScaleDB

Transactions

No

Yes

Yes

Joins

No

Yes

Yes1

Data Consistency

No (Eventual)

Yes

Yes2

SQL Support

No

Yes

Yes

ACID Compliant

No

Yes

Yes

Exploits EBS

No

Yes

Yes

Supports MySQL applications without modification

No

Yes

Yes

Dynamic Elasticity (w/o interrupting the application)

Yes

No

Yes

High-Availability

Yes

No

Yes

Eliminates Partitioning

Yes

No

Yes

Eliminates possible 5-minute data loss upon failure

Yes

No

Yes

Cluster-level load balancing

Yes

No

Yes

1The ScaleDB index delivers multi-table joins with the performance of a single table lookup using a technology that rivals materialized views but without the data synchronization headache.

2ScaleDB’s shared-disk architecture ensures data consistency across all nodes in the cluster.

ScaleDB is a storage engine that plugs into MySQL. It turns MySQL into a shared-disk DBMS, like Oracle RAC. ScaleDB, running on AWS provides elastic scalability, adding/removing nodes according to the number of database connections, all without interrupting any running applications. Also, because ScaleDB doesn’t rely on data partitioning-as you would with shared-nothing databases-the set-up and tuning are very simple.

SimpleDB and RDS are very good and they have their roles. However, I believe that ScaleDB is really the high-end solution, without the high-end price-that enterprise users of the cloud are looking for.


PlanetMySQL Voting: Vote UP / Vote DOWN