ruby

Installing Phusion Passenger on Amazon Linux AMI 1.0

Once again, this is a progression of building out my Amazon Linux AMI, so the pre-requisites might be off, since I've previously installed a number of other things. And once again, this is simply a log of tasks for my own future reference, rather than a build recipe. Maybe this will be useful to someone else as well, so I've gone back and tagged all AMI articles with aws-linux-ami, so you can at least see the history of pre-requisites.

Anyway, this time I'm installing phusion passenger to host the ASP.NET app I ported to Rails last week. The AMI comes with Ruby 1.8.7. I next installed the following repos:

  yum install libcurl-devel openssl-devel mysql-devel ruby-devel rubygems

Even tho gems is now installed, it's not current enough for rails, so first thing, upgrade gems

  gem update --system

I also had rails fail to install, with

  Installing ri documentation for rails-3.0.3...
  File not found: lib

Which i fixed with rebuilding rdoc:

  gem install rdoc-data
  rdoc-data --install
  gem rdoc --all --overwrite

Now it's finally time to install and build rails with mysql support (which is how i set my rails application up) and passenger

  gem install mysql2
  gem install rails
  gem install passenger

Next, build the passenger apache2 module. I actually killed the install the first time around because libcurl-devel and openssl-devel were missing. The installer assured me that it would guide me through getting those dependencies resolved, but I wanted to make sure they came in through yum rather than have this installer download and build them from source. Anyway the command was:

  passenger-install-apache2-module

This installed flawlessly and ended with instructions to put the following in my apache config:

  LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-3.0.2/ext/apache2/mod_passenger.so
  PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-3.0.2
  PassengerRuby /usr/bin/ruby

A git diversion

Before getting to the apache setup of my rails app, I ran into this error trying to check the port out from my repo:

  warning: remote HEAD refers to nonexistent ref, unable to checkout.

I don't know how this happened, since other gitosis repos i've created haven't had the same problem, but running

  git push --all

on my development machine did the job. Apparently it had been pushing changes into the repo, but never set up a branch because that command reported:

  * [new branch]      master -> master

Well, fortunately after that all was good :)

Configuring rails in apache

Finally, the apache vhost config was exceedingly simple:

   <VirtualHost *:80>
      ServerName www.yourhost.com
      DocumentRoot /somewhere/public    # <-- be sure to point to 'public'!
      <Directory /somewhere/public>
         AllowOverride all              # <-- relax Apache security settings
         Options -MultiViews            # <-- MultiViews must be turned off
      </Directory>
   </VirtualHost>

The important thing is that the DocumentRoot needs to point to the rails public directory not the root of the rails application.

The last task was running

  rake db:create:all

to set up the expected db locally. After that, and an apache restart, the app came up without a hitch.

Of course, while setting all this up, I finally figured out why mod_mono was leaking semaphores, making all of this likely moot. But i'm glad to have this alternative while I determine whether the mod_mono behavior is really fixed.

Porting ASP.NET MVC to Ruby on Rails

This isn't yet another .NET developer defecting to Ruby. I have very little interest in making Ruby my primary language. I've done a couple of RoR projects over the years, nothing serious I admit, but I just don't seem to enjoy it in the way that so many of my peers do. That said, RoR does hit a sweetspot for websites. The site I'm porting has very little in terms of business logic — it's primarily HTML templating with navigation — so this was an exercise to circumvent my mod_mono issues.

I'm a huge C# fanboy, but having worked with ASP.NET MVC for a while I have to admit that the amount of cruft one has to assemble to stay DRY in ASP.NET templating is just not worthwhile. While views can be strongly typed, it's an exercise in frustration trying to write templates generically. Maybe this becomes easier with dynamic usage in MVC3, but i haven't checked it out. What certainly doesn't help is that the MVC team decided to make TemplateHelper internal, turning the addition of helpers in the vein of .DisplayFor or .EditorFor into a major task that still ends up being a pile of hacks. Now I'm not an ASP.NET MVC expert and there's probably a lot of extension points I just don't know about. But the articles on extending it that I have found are usually pages of code. I shouldn't have to become a framework internals expert just to add some generic templating extensibility.

Ok, enough ranting. ASP.NET MVC is still a huge improvement over webforms, but right now I'm watching Manos de Mono and OWIN to see what develops in .NET land for websites there. The ASP.NET stack, in my opinion, is just too heavy for something that should be simple.

So, why RoR instead of node.js, since I claimed that I was going to get serious about javascript this year? Mostly because this port has a deadline, so use what you know applies, and it's a production site, so use known stable tech applies. Another benefit was that RoR uses the same <% %> syntax as webforms views and MVC was clearly heavily inspired by RoR.

I ported the site over 3 nights, maybe 10 hours of cumulative seat time which feels like time well spent. Strategic search and replace got me 80% there, faking Html. for my custom extension in RoR got me another 10%, leaving only 10% for actual new business logic written in ruby. Once I get to more complex business logic for the site I may stick to Ruby, although I know I'll be sorely tempted to write it as REST services in C# on top of Dream.

IronRuby isn’t dead, it’s been set free on a farm upstate

Yesterday Jimmy Schementi posted his farewell to Microsoft and with it his thoughts on the future of IronRuby. This made it onto twitter almost immediately as "IronRuby is dead". This was shortly followed by a number of voices countering with, "No it's not, it's OSS, step up community!". I think that's rather missing the point.

Microsoft needs Ruby more than Ruby needs Microsoft

I'll even go as far as saying Ruby has no need for Microsoft. But Microsoft has a clear problem with attracting a community of passionate developers building the next generation of apps. There is a large, deeply entrenched community building enterprise apps that is going to stay loyal for a long time, but just as Office will not stay dominant as a monolithic desktop app, the future of development for Microsoft needs to be web based and there's isn't a whole lot of fresh blood coming in that way.

Fresh blood that is passionate and vocal is flocking to Ruby, Scala, Clojure, node.js, etc.Even the vocal developers on the MS stack are pretty much all playing with Ruby or another dynamic language in some capacity or other. Maybe you think that those people are just a squeaky wheel minority, and maybe you are right. But minority or not, they are people who shape the impressions that the next wave of newcomers sees first. It's the bleeding edge guys that pave the road others will follow.

Startup technology decisions are done via peer networks, not by evaluating vendor marketing messages. Instead of trying to attract and keep alpha geeks, Microsoft is pushing technologies like WebMatrix and Lightswitch, as if drag-n-drop/no-code development wasn't an already reviled stereotype of the MS ecosystem.

Some have said that IronRuby is not in the interest of Microsoft. It would just be a gateway drug that makes it even easier to jump the MS ship. Sorry, but that cat's long out of the bag. Ruby's simplicity and ecosystem make jumping ship already as easy as could be. And right now, once they jump ship, with no integration story, they will quickly loose any desire to hold on to their legacy stack.

IronRuby, while no panacea to these problems, at least offered a way for people considering .NET to not have to choose Ruby or .NET. And it had the potential to expose already devoted Ruby fans to what .NET could offer on top (I know that one is a much harder sell). If you look at the Ruby space, Ruby often benefits from having another language backing it up, which makes Ruby front-end with Scala back-end popular. And if IronRuby were competitive, being able to bridge a Ruby front-end with a C# or F# back-end and have the option to stay in-process is a story worth trying to sell.

Let the community foster IronRuby, that's what OSS is all about!

Ok, that sounds idealistic and lovely, but, um, what community are you talking about? The Ruby community? They're doing fine without IronRuby. The .NET community? Well, OSS on .NET is tiny compared to all other stacks.

Most OSS projects are 99% consumers with 1% actual comitters. And that's fine, those 99% consumers grow the community and with it the pool of one percenters increases as well. But that only works if the project has an appeal to grow the 99% pool. It is virtually impossible to reach the bulk of the .NET pool without a strong push from Microsoft. I'm always amazed how many .NET developers that I meet are completely oblivious to technology not eminating from Redmond. And these are not just people that stumbled into VB because their Excel macros didn't cut it anymore. There are good, smart developers out there that have been well served by Microsoft and have not felt the need to look outside. For a vendor that's a great captive audience, worth a lot of money, so it's been in the interest of Microsoft to keep it that way. But that also means that for a project to get momentum beyond alpha geeks on the MS stack, it's gotta be pushed by Microsoft. It needs to be a first-class citizen on the platform, built into Visual Studio, etc.

The IronRuby catch-22

IronRuby has the potential to draw fresh blood into the existing ecosystem, but won't unless it's already got a large momentum inside of the .NET ecosystem. Nobody is going to use IronRuby instead of Ruby because it's almost as good. You can't get that momemtum without leveraging the captive audience. And you can't leverage that captive audience without real support from within Microsoft. The ball's been in Microsoft's court, but apparently it rolled under a table and has been forgotten.

By arne on | .net, geek | 4 comments
Tags: , ,

Sharing data without sharing data state

I'm taking a break from Promise for a post or two to jot down some stuff that I've been thinking about while discussing future enhancements to MindTouch Dream with @bjorg. In Dream all service to service communication is done via HTTP (although the traffic may never hit the wire). This is very powerful and flexible, but also has performance drawbacks, which have led to many data sharing discussions.

Whether you are using data as a message payload or even just putting data in a cache, you want sender and receiver to be unable to see each others interaction with that data, which would happen if the data was a shared, mutable instance. If you were to allow shared modification on purpose or on accident can have very problematic consequences:

  1. Data corruption: Unless you wrap the data with a lock, two threads could try to modify the data at the same time
  2. Difference in distributed behavior: As soon as the payload crosses a process boundary, it ceases to be shared so changing topology, changes data behavior

There are a number of different approaches for dealing with this, each a trade-off in performance and/or usability. I'll use caching as the use case, since it's a bit more universal than message passing, but the same patterns applies.

Cloning

A naive implementation of a cache might just be a dictionary. Sure, you've wrapped the dictionary access with a mutex, so that you don't get corruption accessing the data. But multiple threads would still have access to the same instance. If you aren't aware of this sharing, expect to spend lots of time trying to debug this behavior. If you are unlucky it's not causing crashes but causes strange data corruption that you won't even know about until your data is in shambles. If you are lucky the program crashes because of an access violation of some sort.

Easy, we'll just clone the data going into the cache. Hrm, but now two threads getting the value are still messing with each other. Ok, fine, we'll clone it coming out of the cache. Ah, but if the orignal thread is still manipulating its copy data while others are getting the data, the cache keeps changing. That kind of invalidates the purpose of caching data.

So, with cloning we have to copy the data going in and coming back out. That's quite a bit of copying and in the case that the data goes into the cache and expires before someone uses it, it's a wasted copy to boot.

Immutability

If you've paid any attention to concurrency discussions you've heard the refrain from the functional camp that data should be immutable. Every modfication of the data should be a new copy with the orginal unchanged. This is certainly ideal for sharing data without sharing state. It's also a degenerative version of the cloning approach above, in that we are constantly cloning, whether we need to or not.

Unless your language supports immutable objects at a fundamental level, you are likely to be building this by hand. There's certainly ways of mitigating its cost, using lazy cloning, journaling, etc. i.e. figuring out when to copy what in order to stay immutable. But likely you are going to be building a lot of plumbing.

But if the facilities exist and if the performance characteristics are acceptable, Immutability is the safest solution.

Serialization

So far I've ignored the distributed case, i.e. sending a message across process boundaries or sharing a cache between processes. Both Cloning and Immutability rely on manipulating process memory. The moment the data needs to cross process boundaries, you need to convert it into a format that can be re-assembled into the same graph, i.e. you need to serialize and deserialize the data.

Serialization is another form of Immutability, since you've captured the data state and can re-assemble it into the original state with no ties to the original instance. So Serialization/Deserialization is a form of Cloning and can be used as an engine for immutability as well. And it goes across the wire? Sign me up, it's all i need!

Just like Immutability, if the performance characteristics are acceptable, it's a great solution. And of course, all serializers are not equal. .NET's default serializer, i believe, exists as a schoolbook example of how not to do it. It's by far the slowest, biggest and least flexible ones. On other end of scale, google's protobuf is the fastest and most compact I've worked with, but there are some flexibility concessions to be made. BSON is a decent compromise when more flexibility is needed. A simple, fast and small enough serializer for .NET that i like is @karlseguin's Metsys.Little. Regardless of serializer, even the best serializer is still a lot slower than copying in-process memory, never mind not even having to copy that memory.

Freeze

It would be nice to avoid the implicit copies and only copy or serialize/deserialize when we need to. What we need is for a way for the originator to be able to declare that no more changes will be made to the data and for the receivers of the data to declare whether they intend to modify the retrieved data, providing the folowing usage scenarios:

  • Originator and receiver won't change the data: same instance can be used
  • Originator will change data, receiver won't: need to copy in, but not coming out
  • Originator won't change the data, receiver will: can put instance in, but need to copy on the way out

In Ruby, freeze is a core language concept (although I profess my ignorance of not knowing how to get a mutable instance back again or whether this works on object graphs as well.) To let the originator and receiver declare their intended use of data in .NET, we could require data payloads to implement an interface, such as this:

public interface IFreezable<T> {
  bool IsFrozen { get; }

  void Freeze(); // freeze instance (no-op on frozen instance)
  T FreezeDry(); // return a frozen clone or if frozen, the current instance
  T Thaw();      // return an unfrozen clone (regardless whether instance is frozen)
}

On submitting the data, the container (cache or message pipeline) will always call FreezeDry() and store the returned instance. If the originator does not intend to modify the instance submitted further, it can Freeze() it first, turning the FreezeDry() that the container does into a no-op.

On receipt of the data, the instance is always frozen, which is fine for any reference use. But should the receiver need to change it for local state tracking, or submitting the changed version, it can always call Thaw() to get a mutable instance.

While IFreezable certainly offers some benefits, it'd be a pain to add to every data payload we want to send. This kind of plumbing is a perfect scenario for AOP, since its a concern of the data consumer not of the data. In my next post, I'll talk about some approaches to avoid the plumbing. In the meantime, the WIP code for that post can be found on github.