Skip to content

geek

Backing up wordpress with git

To back up my wordpress install, i used to do a mysqldump of the DB, bzip it, separately tar/bzip the install directory and rsync the timestamped files to my backup server. This has worked well and was a simple enough setup. A couple of days ago, however, i decided to update wordpress and all extensions and noticed that some of my manual tweaks were gone.

Now, i have to say that the install/upgrade system of wordpress is one of the simplest and troublefree i've used in both commercial and free software -- this is not a complaint about that system. Clearly I manually tweaked files, which means i inherited the responsibility of migrating those tweaks forward. But it did get me thinking about a different upgrade strategy. But even though I have the backups to determine the changes i had to re-apply, the process was annoying. Untar the backup and manually run diffs between live and backup. The task reminded me too much of how I deal with code changes, so why wasn't I treating this like code?

A blog is (mostly) an append-only system

Sure, there's some editing, but most of it is already revisioned, which makes it append only. That means storage of all wordpress assets, including the DB dump should make it an ideal candidate for revision control. Most of the time, the only changes are additions, and if files are actually changed it does represent an update of an existing system and should be tracked with change history. If the live install was under revision control you could just run the upgrade, do a diff to see the local changes and tweak/revert them one at a time, the commit the finished state.

Setting up git and importing old revisions

My hierarchy looks like this:

 .../iloggable/
              /wordpress
              /backup

Inside backup, I kept the tar'ed up copies of the wordpress directory with date suffix, as well as dated, bzipped mysqldumps of the DB. I generally deleted older backups after rsync, since on the live server i only cared about having the most recent backup.

First thing i did was create a mirror of this hierarchy. I used an old wordpress tar for the wordpress directory and bunzipped the db archive from the same date as iloggable.sql as the only file in backup, since for the git repo i no longer needed the other backups, only the most current database dump. I then ran git init inside of iloggable. I also created .gitignore with wordpress/wp-content/cache in it to avoid capturing the cache data.

I added all files and committed this state. I then unarchived the subsequent archives, copied them on top of the mirror hierarchy and added/committed those files in succession. At this point i had created a single git repo of the backup history I had. Now i could review previous states via git log and git diff.

Turning the live copy into the git repo

Finally, i copied the .git directory into the live hierarchy. Immediately, doing git status, it showed me the changes on production since the last backup. I deleted all the old and again added/committed the resulting "changes". That gave me a git repo of the current state, that I pushed to my back-up server.

Now, my backup script just overwrites the current mysqldump, does a git commit -a -m'$timestamp' and git push. From now on, as i do tweaks, or upgrade wordpress, i can do a git commit before and after and I have an exact change history for the change.

But what about your Promise?

See what i did there? Err... yeah, i know it's horrible. I apologize.

I did want to post an update about Promise since I've gone radio-silent since I finished up my series about features and syntax. I've started a deep dive into the DLR, but mostly got sidetracked with learning antlr since I don't want to build the AST by hand for testing. However, coming up with a grammar for Promise is the real stumbling block. So that's where i'm currently at, spending a couple of hours here and there playing with antlr and the grammar.

In the meantime, I've been doing more ruby coding over the last couple of weeks and even dove back into perl for some stuff and the one thing I am more sure is that I find dynamically code incredibly tedious to work with. The lack of static analysis and even simple contracts turns dynamic programming into a task of memorization, terse syntax and text search/replace. I'm not saying that the static type systems of yore are not equally painful with the hoops you have to jump through to explain contracts with any flexibility to the compiler, but given the choice I tend towads that end. This experience just solidifies my belief that a type system like Promise, i.e. types describe contracts, but classes aren't typed other than by their ability to quack appropriately, would be so much more pleasant.

Oh, you can keep the disc, but the bits ain't yours

After writing "Maybe it's time to stop pretending we buy software?", I talked to a couple of gamer friends about their purchasing habits. The concept of DLC that has a strong "withheld content" smell came up and whether this was a "first buyer bonus" or "selling crippled products" had no straight forward and agreed answer. But what did emerge is that pricing of games is a key factor in why used sale and purchase are considered a right. The sentiment that at ~$60 there is an expectation that the game has a residual value you can recoup should the game not justify itself as a keeper. Which, of course, itself is an indicator that our usage patterns of games are much closer aligned with a rental than purchase relationship. In particular, one friend uses Amazon's Trade-In store as a form of game rental. Depending on how many games you play, it is a better deal than Gamefly.

Now it turns out that arguing about used games and whether they are crippled or not may not even be an issue in the future. Ars Technica did a great summary called "No, you don't own it: Court upholds EULAs, threatens digital resale" of the US Court of Appeals for the Ninth Circuit ruling re: Vernor v. Autodesk. The gist is that EULAs are enforceable and that you may really only own a non-transferable license. In the light of keeping your upgrade while selling the old version, that makes sense to me. Of course fairness should dictate that you can sell your license. Then again fairness should also dictate that you don't make copies of the software for all your friends. So, given the unenforceability of fairness, software companies use draconian licensing EULAs and consumers have chosen to ignore them out of hand. This legal decision, however has the possibilities of escalating this conflict and if companies go after used game stores, used DVD stores, etc. I predict that piracy will really run rampant, as consumers will take the only option available to them in fighting rules that violate their sense of fairness.

Physical products engender First Sale Doctrine expectations

I personally have not bought a new Xbox game, relying on my amazon wishlist for those. Of all the games I've played on the Xbox, only GTA4 has felt justified of its full price. The ones I have bought were used and the ones that had no replay value I sold. After all, I had a box with a disc sitting there, so of course I can sell that box.

I have, however, bought plenty of games on Steam. It's a digital sale -- I can install it on any computer when i want to play it but I can't ever sell it or even let someone borrow it. Yet I am happy about those purchases. Ok, what is wrong with me? The difference to me, if were to try to put a finger on it, lies in a combination of pricing and purchasing experience.

New PC games are ususally at least $10 cheaper. Whether you claim that this price difference is historical or because console's are recouping hardware costs, it makes a new game easier to digest. Add to that that Steam has mastered the art of the long tail, reducing prices for older games, frequently doing brief yet radical sales and even adding games not originally released on Steam along with patches and support for newer features such as cloud save game storage. Finally, with Steam (even if this is more Valve itself than anyone else), you usually get a longer life out of the game, with updates and free multiplayer.

The purchasing itself further severs the physical ownership bond you have with boxed games. Aside from a less painful price point, it's simple, immediate gratification, being able to buy a game at any time of day or night. You also generally don't run an installer, you just buy and wait until Steam tells you that the game is ready to play. In all respects the experience feels like a service not a product, which reduces the feeling that you own something that you can resell.

As a game dev, what seems like a better way to deal with the fact that only some percentage will buy your game at full price? Only make money at full sale and try to encourage new purchase by devaluing used games with exclusive DLC, etc? Or sell electronically and cut out used games entirely, but attract those not willing to pay full price by offering sales later? After all, 9 months after release $19.99 (like Left 4 Dead right now) still beats not seeing a dime.

While the Vernor v. Autodesk decision may embolden publishers to crack down on used sales, I sure hope more will follow Valve's model. After all, gamers generally don't talk fondly of publishers, but Valve is almost uniformly a hero to the community and that's while preventing gamers from selling their games. Sounds like they've got a good model going.

People have no business driving on the highway

Image courtesy of Atwater Village NewbieI'm going to go rather deeply off-topic and venture into tl;dr territory: Every time I drive through LA or am on the long 4-lane interstate corridors of Barstow-Las Vegas or the central valley, my mind spends a lot of time contemplating how highway driving is such an inefficient process. It's a perfect scenario of continuous lanes with defined entry/exit/merge points. You get on at one point and off at some other point. The whole having to drive the vehicle between the two points is not only a misapplication of resources but human nature seems to ensure that it'll always be slower than it has to be.

Why autonomous highway vehicles won't happen (anytime soon)

Before I make the case why and how autonomous highway travel could happen, let's just get the naysaying out of the way. Won't may be strong, but since my objections are based on people, not on technology, I can't forsee this change to happen in any near term. Long after the technological hurdles are crossed, the psychological ones, fear and self-determination, are likely to linger.

Fear of yielding control to machines is as old as machines. We're deeply suspicious of anything that wants to take over a task we believe requires our own skillset. Even if repeatedly proven wrong, we believe that a human being can do things better than any machine. Disregarding the thousands of people that die in car accidents due to their own failings (exceeding their or their cars reaction capabilities, driving impaired, etc.), we are willing to accept those deaths as the cost of driving. But should a single person die because of a computer malfunction, all computer controlled cars should immediately be suspended. We only have to look at the recent, false, accusation that Prius' we're running amok because of a faulty on-board computer and the public outcry as proof.

And even if we trusted cars to be better drivers, we still would not yield control because we want to be the ones that decide what to do next. This is more true in car cultures like the US, but the need for self-determination means that we want to be able to drive where and how we want at all times (ignoring that we already have agreed to a meriad of rules of the road). Maybe we want to cross three lanes to get off at that exit. Or we want to weave through traffic. After all, our superior cognitive skills will gets us there faster than flowing with the computer controlled pack, right?

How could it work?

There's just too much variability and unpredictability involved in driving for computers to take over. Well, not so fast. On surface streets that's certainly true. There are so many unexpected factors that require making decisions based not on hard rules, such as bikes, pedestrians, ambigious signage, bad directions, etc. that will keep daily driving out of the reach of autonmous vehicles reach for a while. But highways are different. 99% of all unexpected decision making on highways is due to humans driving in the first place. If you didn't have to react to the unpredictable cars around you, it's a simple set of rules: There's lanes, there's entrance and exit points, there's lane merges and splits and with communication at lightspeed, reacting to conditions created by another car would be far more reliable than the visual detection and reaction of a driver.

So let's say highways are a controlled environment that can be managed by today's technology, how would something like this come to pass, especially since we can't just set up a new, separate highway system and can't turn it on over night.

Autonomous vehicles

One fear and realistic obstacle in computer controlled cars is the central control of all traffic, that even with redundancy is seen as a single point of failure. Also extending trust in computers to trusting some massive government controlled computer is a special leap that's spawned a hundred dystopian sci-fi stories. For this system to have a chance, each car needs to be in control of itself. People will trust their own cars before they trust an outside entity.

You would pull onto the entrance ramp, determine where you want to get off and the car would take over, merge into the traffic flow and on exit at your destination, the car would hand control back over or stop if it sensed that you weren't acknowledging transfer of control. I'll cover how this is possible next, but the important concept is that it's really just an auto-pilot for your car.

Recognition of the static environment

In order for your car to work on auto-pilot, it needs to have a way to recognize entrances, exits, lanes, etc. This could be done with a combination of GPS markers and RFID. GPS for the layout of major features, such as interchanges, entrances and exits and RFID to determine boundaries, etc. This static environment can be built out and expanded one highway at a time and the combination of GPS and RFID means that there is a general expectation with a local verification of that expectation, i.e. a physical safe-guard to override outdated data.

Recognition of the dynamic environment

Just as important as recognizing the lanes is recognizing cars and other obstacles. By using RFID, radar and/or visual recognition and WIFI networking, cars would be able to detect surrounding cars as well as communicate speed changes and negotiate merges. This communication would also allow the forwarding of conditions far ahead without requiring a central traffic computer. It's basically peer-to-peer traffic control. Since the computers would lack the ego of drivers, merges would not require sudden stops that ripple for miles behind and cars could drive faster and closer while still being safer.

The awareness of all other autonomous vehicles and the propagation of information also allows the detection and avoidance of out-of-system obstacles, such as physical objects, cars with malfunctioning systems or rogue drivers who are controlling their cars manually. Once one of these conditions is detected, it might trigger manual control for everyone, which would just return us to the crappy situation we already have, but it still wouldn't be sudden since traffic ahead would warn our cars long before we'd encounter it.

Oh, right, then there's reality

All the technology to bring this about exists today. Mapping our highways for GPS is already done. Implanting RFID markers is no more complicated than existing highway maintenance. Converting the fleet will take a while, but we could easily start with HOV lanes as autonomous lanes and add more lanes until the entire highway and fleet is converted. Sorry, classic cars, you will be relegated to surface streets or require transport. But considering your polluting nature, that's a good thing.

But let's say the government did decide to undertake this, the implementation reality would be lobbying by large government contractors to create their proprietary systems, attach patents to the tech and create inferior technology (just look at voting machines). They'd create unreliable crap that would erode any trust in autonomous vehicles that people could muster. Maybe the government would require some standard but the development of a standard would be a pissing match between car conglomerates that ends up with something as useless as Cablecard and still lock out any innovative application. Finally, the hunger for data would mean that all this peer-to-peer communication and travel data would be an irresistible analytics goldmine for upselling car, travel, etc. products and services, turning the autonomous system into some kind of giant big brother of movement. Of course, considering present consumer behavior, the big brother scenario would probably not act as an obstacle.

I guess I'm going to continue to be stuck behind the guy in the left lane whose speed is the righteous amount over the limit and who only accelerates when his ego is threatened by me passing him on the right. And i'll continue to have to hit the brakes or react to someone else having to hit their brakes because someone decided that their lane change was of higher priority than the flow of the remaining traffic. All of which is completely uneccessary and counter-productive to everyone on the road and highway travel could be as simple as treating your car as your personal travel compartment in a massive compartment routing system. Well, a geek can dream.

libBeanstalk.NET, a Beanstalkd client for .NET/mono

Image curtesy of jepeters74A couple of years back I wrote a store-and-forward message queue called simpleMQ for vmix. A year later, Vmix was kind enough to let me open source it and I put it up on sourceforge (cause that was the place back in the day). But it never got any documentation or promotion love from me, being much too busy building notify.me and using simpleMQ as its messaging backbone. Over those last couple of years, simpleMQ was served us incredibly well at notify.me, passing what must be billions of messages by now. But it does have warts and I have a backlog of fixes/features i've been meaning to implement.

Beanstalkd: simple & fast workqueue

Rather than invest more time on simpleMQ, i've been watching other message/work queues to see whether there is another product i could use instead. I've yet to find a product that i truly like, but Beanstalkd is my favorite of the bunch. Very simple, fast and with optional persistence, it addresses most of my needs.

Beanstalkd's protocol is inspired by memcached. It uses a persistent TCP connection, but relies on connection state only to determine which "tube" (read: workqueue) you are using and to act as a work timeout safeguard. The protocol is ASCII verbs with binary payloads and uses yaml for structured responses.

Tubes are created on demand and destroyed once empty. By default beanstalkd is in-memory only, but can use a binary log to store items and recover the in-memory state by log playback. I had briefly looked at zeromq, but after finding out that its speed relies on no persistence, I decided to give it a skip. zeromq might be web scale, but i prefer a queue that doesn't degrade to behaving like /dev/null :) Maybe my transactional RDBMS roots are showing, but I have a soft spot for at least journaling data to disk.

One concept of beanstalkd that i'm still conflicted about is that work is given a processing time-out (time-to-run) by the producer of the work, rather than having the consumer of the work declare its intended responsiveness. Since the producer doesn't know how soon the work gets picked up, i.e. time-to-run is not a measure of work item staleness, I don't see a great benefit for having the producer dictate the work terms.

The other aspect of work distribution beanstalkd lacks for my taste is the idea of being able to produce work in one instance and have it forwarded to another instance when that instance is available, i.e. store-and-forward queues. I like to keep my queues on the current host so i can produce work without having to rely on the uptime of the consumer or some central facility. However, store-and-forward is an implementation detail I can easily fake with a daemon on each machine that acts as a local consumer and distributor of work items, so it's not something i hold against beanstalkd.

libBeanstalk.NET

Notify.me being a mix of perl and C#, i needed a .NET client. A protocol complete one not existing and given the simplicity of the Beanstalkd protocol, I opted to write my own and have released it under Apache 2.0 on github.

I've not put DLLs up for downlad, since the API is still somewhat in flux as I continue to add features, but the current release supports the entire 1.3 protocol. By default it considers all payloads as binary streams, but I've included extension methods to handle simple string payloads:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put("foo");

    // reserve data from queue
    var reserve = client.Reserve();
    Console.Writeline("data: {0}", reserve.Data);

    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

The binary surface is just as simple:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put(100, 0, 120, data, data.Length);

    // reserve data from queue
    var reserve = client.Reserve();

    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

I've tried to keep the interface IBeanstalkClient to be a close as possible to the protocol verb signatures and rely on extension methods to create simpler versions on top of that interface. To facilitate extensions that provide smart defaults, the client also has an instance of a Defaults member that can be used to initialize those values.

The main deviation from the protocol is how I handle producer and consumer tubes. Rather than have a separate getter and setter for the tube that put will enter work into, I simply have a settable property CurrentTube. And rather than surfacing watch, ignore and listing of consumer tubes, the client includes a special collection, WatchedTubes, with the following interface:

interface IWatchedTubeCollection : IEnumerable<string> {
    int Count { get; }
    void Add(string tube);
    bool Remove(string tube);
    bool Contains(string tube);
    void CopyTo(string[] array, int arrayIndex);
    void Refresh();
}

I was originally going to use ICollection<string>, but Clear() did not make sense and I wanted to have a manual method to reload the list from the server, which is exposed via Refresh(). Under the hood, watched tubes is a hashset, so adding the same tube multiple times has no effect, neither is order of tubes in the collection guaranteed.

Future work

The client is functional and can do everything that Beanstalkd offers, but it's really just a wire protocol, akin to dealing with files as stream. To make this a useful API, it really needs to take the 90% use cases and remove any friction and repetition they would encounter.

Connection pooling

BeanstalkClient isn't, nor is it meant to be, thread-safe. It assumes you create a client when you need it and govern access to it, rather than sharing a single instance. This was motivated by Beanstalkd's behavior of storing tube state as part of the connection. Given that I encourage clients to be created on the fly to enqueue work, it makes sense that under the hood clients should use a connection pool both to re-use existing connections rather than constantly open and close sockets and to limit the maximum sockets a single process tries to open to Beanstalkd. Pooling wouldn't mean sharing of a connection by clients, but handing off connections to new clients and putting them in a pool to be closed on idle timeout once the client is disposed.

Most of this work is complete and on a branch, but i want to put it through some more testing before merging it back to master, especially since it will introduce client API changes.

Distributed servers

The Beanstalkd FAQ has this to say about distribution:

Does beanstalk inherently support distributed servers?

Yes, although this is handled by the clients, just as with memcached. The beanstalkd server doesn't know anything about other beanstalkd instances that are running.

I need to take a look at the clients that do implement this and determine what that means for me. I.e. do they use some kind of consistent hashing to determine which node to use for a particular tube, etc. But I do want to have parity with other clients on this.

POCO Producers and Consumers

For me, the 90% use case for a work queue is produce work on some threads/processes/machines and consume that work on a number of workers. Generally that item will have some structured fields describing the work to be done and producers and consumers will use designated tubes for specific types of work. These use cases imply that producers and consumers are separate user stories, that they are tied to specific tubes and deal with structured data. My current plan is to address these user stories with two new interfaces that will look similar to these:

public interface IBeanstalkProducer<T> {
  BeanstalkProducerDefaults Defaults { get; }
  PutResponse Put(T);
}

public interface IBeanstalkConsumer<T> {
  BeanstalkConsumerDefaults Defaults { get; }
  Job<T> Reserve();
  bool Delete(Job<T> job);
  Release(Job<T> job);
}

The idea with each is that it's tied to a tube (or tubes for the consumer) at construction time and that the implementation will have a simple way of associating a serializer for the entity T (will provide protobuf and MetSys.Little support out of the box).

Rx support via IObservable<Job<T>>

Once there is the concept of a Job<T>, it makes sense that reservation of jobs should be exposed as a stream of work that can be processed via link. Although since items should only be reserved when the subscriber accepts the work, it should probably be encapsulated in something like this:

public interface Event<T> {
  Job<T> Take();
  Job<T> Job { get; }
}

This way, multiple subscribers can try to reserve an item and items not reserved by anyone are released automatically.

As I work on the future work items, I will also use the library in production so i can get better educated about the real world behavior of Beanstalkd and what uncovered scenarios the client runs into. There is ok test coverage over the provided behavior but I certainly want to increase that signficantly as i keep working on it.

For the time being, I hope the library proves useful to other .NET developers and would love to get feedback, contributions and issues you may encounter.

Maybe it's time to stop pretending we buy software?

There's been a lot of noise about comments made by THQ's Cory Ledesma about used games. Namely,

"I don't think we really care whether used game buyers are upset because new game buyers get everything. So if used game buyers are upset they don't get the online feature set I don't really have much sympathy for them." -cvg

Well, this has gotten a lot of gamers upset, and my immediate reaction was something like "dude, you are just pissing off your customers." And while Cory may have been the one to say it out loud, actions by EA and others in providing free DLC only to the original buyer and similar original buyer incentives show that the industry in general agrees with his sentiments.

Holding steadfast to my first-sale doctrine rights, I, like most gamers, software and media purchasers, strongly believe that we can sell those bits we bought. Of course, EULAs have said nu-uh to that belief for just as long. We purchasers of bits only own a license to those bits, we don't own a product. But just as nobody reads an EULA, everybody believes those EULAs to be unenforcable. I own those bits, man!

So I continue to believe that when I purchase a product, let's say some bits on a DVD, i can sell it again or buy such a product from someone else. It wasn't until I read Penny Arcade earlier this week, that I had to admit that, first-sale doctrine notwithstanding, I am not their customer.

Penny Arcade - Words And Their Meanings
Penny Arcade - Words And Their Meanings

But, I thought, just like buying CDs used, I am actually contributing to a secondary market that promotes the brand of the artist. Buying that old CD used makes it more likely that I will buy the next one new, or that I will go to their show when they come to town, etc. Put aside whether this secondary market really has the magical future revenue effects i ascribe to it, for games there is no such secondary market. As Tycho said in his post accompanying the strip:

"If I am purchasing games in order to reward their creators, and to ensure that more of these ingenious contraptions are produced, I honestly can't figure out how buying a used game was any better than piracy. From the the perspective of a developer, they are almost certainly synonymous." - tycho, penny arcade

Ok, maybe you think the secondary market is sequels that you will buy new because you bought the original used. Never mind that most sequels are farmed out to another development house by the publisher, buying used games, at best, actively encourages the endless milking of sequels rather than new IP. But it's even worse for games, because virtually all games now include some multi-player component and keeping that running costs real money. You paying for Xbox Live doesn't mean the publisher isn't still paying more cash to Microsoft to run those servers. So every used Modern Warfare player costs the publisher money while only Gamestop made any cash on the sale. So, sure, you own that disk, but you're insane if you think that the developer/publisher owes you anything.

Now, let's extend this to the rest of the software market. Here you can argue a bit more for a secondary market, since software regularily comes out with new versions, encouraging you to pugrade. If you look at that boxed software revenue cycle it becomes clear that the added features and version revving just exist to extend a product into a recurring revenue stream. And if that's the motivation, it also means we're encouraging developers to spend less on quality and bug fixes (because nobody wants to pay for those), and more on bells and whistles, cause those justify the version rev and with it the upgrade price. In reality, if you use Photoshop professionally you've long ago stopped being a purchaser of boxed software and are instead a subscriber to the upgrade path.

This fickle revenue stream also has an effect on pricing. You may only use Powerpoint once in a while, but you paid to use it 24/7. Or maybe because you don't use it enough you've rationalized pirating it, which only serves to justify a high price tag, since the paying customers are subsidizing the pirates. Either way, the developer inflates the price to smooth out the revenue stream.

The sooner we stop pretending that we buy software and just admit that really we just want to rent it, the better. Being addicted to high retail prices, some publishers certainly will try to keep the same pricing as they move to the cloud, but the smart ones will adjust their pricing to attract those buyers who would never have bought the boxed version. Buying metered or by subscription has the potential for concentrating on excellence rather than bloat and the responsiveness and frequent updates of existing services seem to bear that promise out already. It's really in our favor to let go of idea of wanting a boxed product with a resale value.

Loading per solution settings with Visual Studio 2008

If you ever work on a number of projects (company, oss, contracting), you're likely familiar with coding style issues. At MindTouch we use a number of naming and formatting styles that are different from the Visual Studio defaults, so when I work on github and other OSS projects my settings usually cause formatting issues. One way to address this is to use ReSharper and its ability to store per solution settings. But I still run into some formatting issues with Visual Studio settings that are not being overriden by ReSharper, especially when using Ctrl-K, Ctrl-D, which has become a muscle memory keystroke for me.

While Visual Studio has only per install global settings, it does at least let you import and export them. You'd figure that that could be a per solution setting, but after looking around a bit and getting the usual re-affirmation that Microsoft Connect exists purely to poke developers in the eye, i only found manual or macro solutions. So i decided to write a Visual Studio 2008 Add-in to automate this behavior.

Introducing: SolutionVSSettings Add-in

The goal of the Add-in is to be able to ship your formatting rules along with your solution (which is why I also highly recommend using ReSharper, since you can set up naming conventions and much more). I wanted to avoid any dialogs or user interaction requirements with the process, but wanted to leave open options for overriding settings in case you do use the Add-in but have one or two projects you don't want accept settings for or want to have a default setup you want to use without a per solution setting. The configuration options are listed in order of precedence:

Use the per solution solutionsettings.config config xml file

The only option for the config file right now is an absolute or relative path (relative to the solution) to the .vssettings file to load as the solution is loaded. You can check the config file in with your solution, or keep it as a local file ignored by source control and point to a settings file in the solution or a common one somewhere else on your system. Currently the entirely of configuration looks like this:

<config>
  <settingsfile>{absolute or relative path to settingsfile}</settingfile>
</config>

The purpose of this method, even though it is the highest precendence, is to easily set up an override for a project that already has a settings file that the Add-In would otherwise load.

If no solutionsettings.config is found, the Add-in will look for a solution item named 'solution__.vssettings' and load it as the solution settings. Since this file is part of the solution and will be checked in with the code, this is the recommended default for sharing settings.

Use environment variable 'solutionsettings.config'

Finally, if no settings are found by the other methods, the Add-in will look for an environment variable 'solutionsettings.config' to find an absolute or relative path (relative to the solution) to a config file (same as above) from which to get the settings file path. This is particularily useful if you have a local standard and don't include it in your own solutions, but need to make sure that local standard is always loaded, even if another solution previously loaded its own settings.

How does it work?

The workhorse is simply a call to:

DTE2.ExecuteCommand("Tools.ImportandExportSettings", "/import:{settingfile}");

The rest is basic plumbing to set up the Add-in and find the applicable settings file.

The Add-In subclasses IDTExtensibility2 and during initialization subscribes itself to receive all solution loaded events:

public void OnConnection(object application, ext_ConnectMode connectMode, object addInInst, ref Array custom) {
    _applicationObject = (DTE2)application;
    _addInInstance = (AddIn)addInInst;
    _debug = _applicationObject.ToolWindows.OutputWindow.OutputWindowPanes.Add("Solution Settings Loader");
    Output("loaded...");
    _applicationObject.Events.SolutionEvents.Opened += SolutionEvents_Opened;
    Output("listening for solution load...");
}

I also setup an OutputWindow to report what the Add-In is doing. OutputWindows are a nice, unobtrusive ways in Visual studio to report status and not be seen unless the user cares.

The event handler for solutions being opened does the actual work of looking for possible settings and if one is found to load it:

void SolutionEvents_Opened() {
    var solution = _applicationObject.Solution;
    Output("loaded solution '{0}'", solution.FileName);

    // check for solution directory override
    var configFile = Path.Combine(Path.GetDirectoryName(solution.FileName), "solutionsettings.config");
    string settingsFile = null;
    if(File.Exists(configFile)) {
        Output("trying to load config from '{0}'", configFile);
        settingsFile = GetSettingsFile(configFile, settingsFile);
        if(!string.IsNullOrEmpty(settingsFile)) {
            Output("unable to find override '{0}'", settingsFile);
        } else {
            Output("using solutionsettings.config override");
        }
    }

    // check for settings in solution
    if(string.IsNullOrEmpty(settingsFile)) {
        var item = _applicationObject.Solution.FindProjectItem(SETTINGS_KEY);
        if(item != null) {
            settingsFile = item.get_FileNames(1);
            Output("using solution file '{0}'", settingsFile);
        }
    }

    // check for environment override
    if(string.IsNullOrEmpty(settingsFile)) {
        configFile = Environment.GetEnvironmentVariable("solutionsettings.config");
        if(!string.IsNullOrEmpty(configFile)) {
            settingsFile = GetSettingsFile(configFile, settingsFile);
            if(string.IsNullOrEmpty(settingsFile)) {
                Output("unable to find environment override '{0}'", settingsFile);
            } else {
                Output("using environment config override");
            }
        }
    }
    if(string.IsNullOrEmpty(settingsFile)) {
        Output("no custom settings for solution.");
        return;
    }
    var importCommand = string.Format("/import:\\"{0}\\"", settingsFile);
    try {
        _applicationObject.ExecuteCommand("Tools.ImportandExportSettings", importCommand);
        Output("loaded custom settings\\r\\n");
    } catch(Exception e) {
        Output("unable to load '{0}': {1}", settingsFile, e.Message);
    }
}

And that's all there is to it.

More work went into figuring out how to build an installer than building the Add-In.... I hate MSIs. At least i was able to write the installer logic in C# rather than one of the more tedious extensibility methods found in InstallShield (the least value for your money I've yet to find in any product) or Wix (a vast improvment over other installers, but it's still the victim of MSIs.)

Installation, source and disclaimer

The source can be found under Apache license at GitHub, which also has an MSI for those just wishing to install it.

NOTE: The MSI and source code come with no express or implied guarantees. It may screw things up in your environment. Consider yourself warned!!

This Add-In does blow away your Visual Studio settings with whatever settings file is discovered via the above methods. So, before installing this you should definitely back up your settings. I've only tested it on my own personal setup, so it may certainly misbehave on someone else's setup. It's certainly possible it screws up your settings or even Visual Studio install. I don't think it will, but I certainly can't call this well tested across environment, so backup and use at your own risk.

Promise: Method slots and operators

Before getting into method slots, here's a quick review of the Promise lambda grammar:

lambda: [<aignature>] <expression>;

signature: (arg1>, ... <argN>[|<return-type>])

arg: [<type>] <argName>[=<init-expression>]

expression: <statement> | { <statement1>; ... <statementN>; }

A lambda can be called with positional arguments either with the parentheses-comma convention ( foo(x,y) ) or the space-separated convention ( foo x y ), or with a JSON object as argument ( foo{ bar: x, baz: y} ).

Method Overload (revised)

When i decided to use slots that you assign lambdas as methods, I thought I'd be clever and make those slots polymorphic to get around shortcomings i perceived in the javascript model of just attaching functions to named fields. After listening to Rob Pike talk about Go at OSCON, I decided this bit of cleverness did not serve a useful purpose. In Go there are no overloads, because a different signature denotes different behavior and the method name should reflect that difference. Besides, even if you want overload type behavior in Promise, you can get it via the JSON calling convention:

class Index {
  Search:(|SearchResult) {
     foreach(var keyvaluepair in $_) {
       // handle undeclared named parameters
     }
     ...
  };
}

Basically the lambda signature is used to declare an explicit call contract, but using a JSON object argument, undeclared parameters can just as easily be passed in.

If a method is called with positional arguments instead of a JSON object, the default JSON object will contain a field called args with an array value

class Index {
  Search: {
    ...
  };
}

Index.Search('foo','documents',10);

// $_ => { args: ['foo','documents',10] }

The above signature shows a method assigned a lambda without any signature, i.e. it accepts any input and returns an untyped object. Receiving $_.args is not contingent on that signature, it will always be populated, regardless of the lambda signature.

Wildcard Method

A class can also contain a wildcard method to catch all method calls that don't have an assigned slot.

class Index

  *: {
    var (searchType) = $_._methodname./^Find_(.\*)$/;
    if(searchType.IsNil) {
       throw new MethodMissingException();
    }
    ...
  };
}

The wild card method is a slot named *. Retrieving the call arguments is the same as with any other method without declared signature, i.e. $_ is used. In addition, the methodname used in the call is stuffed into $_ as the field _methodname.

The above example shows a method that accepts and call that starts with Find_ and takes the remainder of the name as the document type to find, such as Find_Images, Find_Pages, etc. This is done by using the built in regex syntax, i.e. you can use ./<regex>/ and ./<regex>/<substitution>/ on any string (or the string an object converts to), similar to perl's m// and s///. Like perl, the call returns a list of captures, so using var with a list of fields, in this case one field called searchType, receives the captures, if there is a match.

When a method is called that cannot be found on the Type, it throws a MethodMissingException. A wildcard method is simply a hook that catches that exception. By throwing it ourselves, our wildcard reverts to the default behavior for any method that doesn't match the desired pattern. This also gives parent classes or mix-ins the opportunity to fire their own wildcard methods.

Wildcard methods can only declared in classes and mix-ins, not on Types. Types are supposed to be concrete contracts. The existence of a wildcard does mean that the class can satisfy any Type contract and can be used to dynamically implement type contracts without having to declare each method (think mocks).

Operators

Operators are really just methods called with the whitespace list syntax

var x = 3;
var y = x + 5;  // => 8
var z = x.+(5); // => 8

// most operators implements polish notation as appropriate
var v = x.+(5,6); // => 14

Operators are just methods, which means you can assign them yourselves as well

class Query {
 List<Query> _compound;
 op+: {
    var q = Query();
    q._compound.AddRange(_compound);
    q._compound.AddRange($_args);
    q;
  };
}

The only difference between a normal method slot and an operator slot is that the operator slot has the op prefix for disambiguation.

And now for the hard part

That concludes the overview of the things I think make Promise unique. There's certainly tons more to define for a functioning language, but most of that is going to be very much common syntax. So now it's time to buckle down dig into antlr and the DLR to see what it will take to get some semblance of Promise functioning.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Object notation and serialization

I thought I had only one syntax post left before diving into posts about attempting to implement the language. But starting on a post about method slots and operators, I decided that there was something else i needed to cover in more detail first: The illustrious JSON object.

I've alluded to JSON objects more than a couple of times in previous posts, generally as an argument for lambda calls. Since everything in Promise is a class, JSON objects are bit of an anomaly. Simply, they are the serialization format of Promise, i.e. any object can be reduced to a JSON graph. As such it exists outside the normal class regime. It is also closer to BSON, as it will retain type information unless serialized to text, and can be serialized on the wire either as JSON or BSON. So really it looks like javascript object notation (JSON) but it's really Promise object notation. For simplicity, i'm going to keep calling it JSON tho.

Initialization

Creating a JSON object is the same as in javascript:

var a = {};
var b = [];
var c = { foo: ["bar","baz"] };
var d = { song: Song{name: "ShopVac"} };

The notation accepts hash and array initializers and their nesting, as well as object instances as values. Fields are always strings.

Serialization

The last example shows that you can put Promise objects into a JSON graph, and the object initializer itself takes another JSON object. I explained in "Promise: IoC Type/Class mapping" that passing a JSON object to the Type allows the mapped class constructor to intercept it, but in the default case, it's simply a mapping of fields:

class Song {
  _name;
  Artist _artist;

  Artist:(){ _artist; }
  ...
}

class Artist {
  _name;
  ...
}

var song = Song{ name: "The Future Soon", artist: { name: "Johnathan Coulton" } };

// get the Artist object
var artist = song.Artist;

//serialize the object graph back to JSON
print song.Serialize();
// => { name: "The Future Soon", artist: { name: "Johnathan Coulton" } };

Lacking any intercepts and maps, the initializer will assign the value name to _name, and when it maps artist to _artist, the typed nature of _artist invokes its initializer with the JSON object from the artist field. Once .Serialize() is called, the graph is reduced to the most basic types possible, i.e. the Artist object is serialized as well. Since the serialization format is meant for passing DTOs, not Types, the type information (beyond fundamental types like String, Num, etc.) is lost at this stage. Circular references in the graph would be dropped--any object already encountered in serialization causes the field to be omitted. It is omitted rather than set to nil so that its use as an initializer does not set the slot to nil, but allows the default initializer to execute.

Above I mentioned that JSON field values are typed and showed the variable d set to have an object as the value of field song. This setting does not cause Song to be serialized. When assigning values into a JSON object, they retain their type until they are used as arguments for something that requires serialization or are manually serialized.

var d = { song: Song{name: "ShopVac"} };

// this works, since song is a Song object
d.song.Play(); 

var e = d.Serialize(); // { song: { name: "ShopVac" } }

// this will throw an exception
e.song.Play();

// this clones e
var f = e.Serialize();

Serialization can be called as many times as you want and acts as a clone operation for graphs lacking anything further to serialize. The clone is a lazy operation, making it very cheap. Basically a pointer to the original json is returned and it is only fully cloned if either the original or the clone are modified. This means, the penalty for calling .Serialize() on a fully serialized object is minimal and is an ideal way to propagate data that is considered immutable.

Access and modification

JSON objects are fully dynamic and can be access and modified at will.

var x = {foo: "bar"};

// access by dot notation
print x.foo; // => "bar"

// access by name (for programatic access or access of non-symbolic names)
print x["foo"]; // => "bar"

x.foo = ["bar","baz"]; // {foo: ["bar","baz"]}
x.bar = "baz"; // {bar: "baz", foo: ["bar", "baz"]};

// delete a field via self-reference
x.foo.Delete();
// or by name
x["foo"].Delete();

The reason JSON objects exist as entities distinct from class defined objects is to provide a clear separation between objects with behavior and data only objects. Attaching functionality to data should be an explicit conversion from a data object to a classed object, rather mixing the two, javascript style.

Of course, this dichotomy could theoretically be abused with something like this:

var x = {};
var x.foo = (x) { x\*x; };
print x.foo(3); // => 9

I am considering disallowing the assignment of lambdas as field values, since they cannot be serialized, thus voiding this approach. I'll punt on the decision until implementation. If lambdas end up as first class objects, the above would have to be explictly prohibited, which may lead me to leave it in. If however, I'd have to manually support this use case, i'm going to leave it out for sure.

JSON objects exist as a convenient data format internally and for getting data in and out of Promise. The ubiquity of JSON-like syntax in most dynamic languages and it's easy mapping to object graphs makes it the ideal choice for Promise to foster simplicity and interop.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

IronRuby isn't dead, it's been set free on a farm upstate

Yesterday Jimmy Schementi posted his farewell to Microsoft and with it his thoughts on the future of IronRuby. This made it onto twitter almost immediately as "IronRuby is dead". This was shortly followed by a number of voices countering with, "No it's not, it's OSS, step up community!". I think that's rather missing the point.

Microsoft needs Ruby more than Ruby needs Microsoft

I'll even go as far as saying Ruby has no need for Microsoft. But Microsoft has a clear problem with attracting a community of passionate developers building the next generation of apps. There is a large, deeply entrenched community building enterprise apps that is going to stay loyal for a long time, but just as Office will not stay dominant as a monolithic desktop app, the future of development for Microsoft needs to be web based and there's isn't a whole lot of fresh blood coming in that way.

Fresh blood that is passionate and vocal is flocking to Ruby, Scala, Clojure, node.js, etc.Even the vocal developers on the MS stack are pretty much all playing with Ruby or another dynamic language in some capacity or other. Maybe you think that those people are just a squeaky wheel minority, and maybe you are right. But minority or not, they are people who shape the impressions that the next wave of newcomers sees first. It's the bleeding edge guys that pave the road others will follow.

Startup technology decisions are done via peer networks, not by evaluating vendor marketing messages. Instead of trying to attract and keep alpha geeks, Microsoft is pushing technologies like WebMatrix and Lightswitch, as if drag-n-drop/no-code development wasn't an already reviled stereotype of the MS ecosystem.

Some have said that IronRuby is not in the interest of Microsoft. It would just be a gateway drug that makes it even easier to jump the MS ship. Sorry, but that cat's long out of the bag. Ruby's simplicity and ecosystem make jumping ship already as easy as could be. And right now, once they jump ship, with no integration story, they will quickly loose any desire to hold on to their legacy stack.

IronRuby, while no panacea to these problems, at least offered a way for people considering .NET to not have to choose Ruby or .NET. And it had the potential to expose already devoted Ruby fans to what .NET could offer on top (I know that one is a much harder sell). If you look at the Ruby space, Ruby often benefits from having another language backing it up, which makes Ruby front-end with Scala back-end popular. And if IronRuby were competitive, being able to bridge a Ruby front-end with a C# or F# back-end and have the option to stay in-process is a story worth trying to sell.

Let the community foster IronRuby, that's what OSS is all about!

Ok, that sounds idealistic and lovely, but, um, what community are you talking about? The Ruby community? They're doing fine without IronRuby. The .NET community? Well, OSS on .NET is tiny compared to all other stacks.

Most OSS projects are 99% consumers with 1% actual comitters. And that's fine, those 99% consumers grow the community and with it the pool of one percenters increases as well. But that only works if the project has an appeal to grow the 99% pool. It is virtually impossible to reach the bulk of the .NET pool without a strong push from Microsoft. I'm always amazed how many .NET developers that I meet are completely oblivious to technology not eminating from Redmond. And these are not just people that stumbled into VB because their Excel macros didn't cut it anymore. There are good, smart developers out there that have been well served by Microsoft and have not felt the need to look outside. For a vendor that's a great captive audience, worth a lot of money, so it's been in the interest of Microsoft to keep it that way. But that also means that for a project to get momentum beyond alpha geeks on the MS stack, it's gotta be pushed by Microsoft. It needs to be a first-class citizen on the platform, built into Visual Studio, etc.

The IronRuby catch-22

IronRuby has the potential to draw fresh blood into the existing ecosystem, but won't unless it's already got a large momentum inside of the .NET ecosystem. Nobody is going to use IronRuby instead of Ruby because it's almost as good. You can't get that momemtum without leveraging the captive audience. And you can't leverage that captive audience without real support from within Microsoft. The ball's been in Microsoft's court, but apparently it rolled under a table and has been forgotten.