Thursday, February 28, 2013

How I feel as a developer

Here's a blog with a bunch of animated gifs illustrating what it's like to be a developer.

Monday, October 1, 2012

Node.js, the next web development platform?

Node.js has been gaining steam over the last little while, thanks to videos like this, showing how Node.js can handle 1000 concurrent requests. The development community has grown by leaps and bounds and new packages are constantly being added to NPM. In addition to this, Microsoft has thrown its weight behind it and has provided support for Node.js development in Windows Azure and their WebMatrix IDE. The selling features sound pretty good compared to existing frameworks and architectures:
  • Better memory efficiency under high loads compared to traditional thread based concurrency model
  • No locks and as a result no dead locks
  • HTTP is a first class protocol, this means it's a good foundation for web libraries and frameworks
  • Developers can write code in one language on the client and server
In order to jump right in and start learning Node.js I decided to write an adapter for the jugglingdb ORM. This is an ORM which is bundled in the very Ruby on Rails like framework Railway.js. The adapter was written to integrate into Windows Azure Table Storage and is available in github and on npm.

From this experience I've learned several things, the most important of which is that Node.js is not ready for primetime. While it's great for doing simple endpoints, complex applications seem to be very hard to troubleshoot and debug. This is due to the use of async callbacks all over the place. While this allows high concurrency, debugging becomes a muddled mess. Here is an example of the getTable function in the TableService class, provided by Microsoft.

TableService.prototype.getTable = function (table, optionsOrCallback, callback) {
  var options = null;
  if (typeof optionsOrCallback === 'function' && !callback) {
    callback = optionsOrCallback;
  } else {
    options = optionsOrCallback;
  }

  validateTableName(table);
  validateCallback(callback);

  var webResource = WebResource.get("Tables('" + table + "')");
  webResource.addOptionalHeader(HeaderConstants.CONTENT_TYPE, 'application/atom+xml;charset="utf-8"');
  webResource.addOptionalHeader(HeaderConstants.CONTENT_LENGTH, 0);

  var processResponseCallback = function (responseObject, next) {
    responseObject.tableResponse = null;
    if (!responseObject.error) {
      responseObject.tableResponse = TableResult.parse(responseObject.response.body);
    }

    var finalCallback = function (returnObject) {
      callback(returnObject.error, returnObject.tableResponse, returnObject.response);
    };

    next(responseObject, finalCallback);
  };

  this._performRequestExtended(webResource, null, options, processResponseCallback);
};

Not only is this hard to read, it's not clear why all of these layers of callbacks are even necessary. This could just be this one library, however, it also seems that Node.js is solving a problem that isn't there for a vast majority of people. The Railways.js and Express frameworks are great, however, if the problem you are trying to solve is concurrency, you are unlikely to use a heavyweight framework. You are much more likely to build a highly specialized and tuned end point, using low level platform constructs.

That is not to say that it's a lost cause. Projects like TypeScript are seeking to make Node.js development much more scalable. Tools like Jetbrains WebStorm with the Node.js plugin, provide a slew of  functionality for JavaScript development that you may be used to in other development environments. This includes refactoring, code completion, a nodeunit test runner, and debugging

Would I use Node.js on my next project? Potentially for small endpoints that may need high concurrency and throughput. However, I am interested in seeing where the development community takes the concept of JavaScript on the server. Perhaps at some point this will become a viable alternative to the existing platforms and frameworks.

Friday, April 27, 2012

Using Sqlite to test database interaction with NHibernate

In a previous post, Stubbing Out NHibernate's ISession, I talked about creating a mock class in order to stub out QueryOver interaction in your classes. In this post I'd like to talk about another approach to take: using an in-memory Sqlite database to simulate real data interaction. The upside to using an in-memory Sqlite database, is that you can quickly test to make sure if your query will actually work as you expect it to work. There are a couple of downsides, however. One is that you're using Sqlite as opposed to whatever production grade database you will be using in deployment. The second, is that set-ups for your tests can get pretty complex, as you may potentially need to persist a lot of entities.

The first thing you'll need to do, is to download Sqlite and the appropriate .NET wrapper. These can be found here and here respectively. Next you'll need to create some sort NHibernate session manager. This class should be able to create an in-memory Sqlite database, create the schema, and open an ISession. Here's  what mine looks like

public class NHibernateSqliteSessionFactory
{
 private static readonly NHibernateSqliteSessionFactory _instance = new NHibernateSqliteSessionFactory();
 private static Configuration _configuration;
 private readonly ISessionFactory _sessionFactory;

 private NHibernateSqliteSessionFactory()
 {
  var model = AutoMap.AssemblyOf<Entity>(new GasOverBitumenConfiguration())
   .UseOverridesFromAssemblyOf<NHibernateSessionFactory>()
   .Conventions.Add(
    DefaultLazy.Always(),
    DynamicUpdate.AlwaysTrue(),
    DynamicInsert.AlwaysTrue())
   .Conventions.AddFromAssemblyOf<Entity>();

  FluentConfiguration fluentConfig = Fluently.Configure()
   .Database(
    new SQLiteConfiguration()
     .InMemory()
     .UseReflectionOptimizer()
   )
   .ProxyFactoryFactory("NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle")
   .Mappings(m =>
        {
         m.AutoMappings.Add(model);
         m.FluentMappings.AddFromAssemblyOf<Entity>();
        })
   .ExposeConfiguration(c => _configuration = c);

  _sessionFactory = fluentConfig.BuildSessionFactory();
 }

 public static NHibernateSqliteSessionFactory Instance
 {
  get { return _instance; }
 }

 public ISession OpenSession()
 {
  var session = _sessionFactory.OpenSession();
  new SchemaExport(_configuration).Execute(false, true, false, session.Connection, new StringWriter());
  session.BeginTransaction();
  return session;
 }
}

Note that we are using FluentNhibernate to create the configuration. This can be just as easily accomplished with either traditional or loquacious configuration. Furthermore the OpenSession method creates a session and sets up the database schema. When the session is closed, the in-memory database is gone.

It might be a good idea to also persist several entities in your database, such that your unit tests are not required to have large set up methods. Your test can then simply open the session, do what is being tested and verify that the result is correct. In your clean up step, it's a good idea to dispose the session.

There you have it. An easy way to set up an in-memory Sqlite database for NHibernate.

Wednesday, December 14, 2011

Stubbing out NHibernate's ISession

When I first started using NHibernate I didn't know much about design patterns. My early attempts were messy, ugly, and unstable. I finally learned about the repository pattern and started using it. Things were great, I was able to write cleaner code and was able to abstract away my use of NHiberante. The thing is, NHibernate's ISession is already an implementation of this pattern, in my opinion. So abstracting an abstraction just adds an extra layer of complexity. That's not to say that you shouldn't use the repository pattern to persist objects, but for basic reads, using the ISession directly, has a lot of benefits. (For a more thorough analysis see Ayende's blog here and here.)

There is, however, one advantage to abstracting away the ISession. It allows you to unit test your code in a much easier fashion. Since you can easily mock out your repository object, you can write unit tests for modules that use it. This is much easier than if you used the ISession directly. Whether you use the Criteria, QueryOver, or the Linq Query apis, it's almost impossible to stub out this interaction. If you do happen to stub it out using a mocking framework, your unit test Arrange step would become so messy, that your unit tests would become completely unreadable.

So what's the solution? In a recent project, we decided not to abstract away, NHibernate when doing queries. Of course, we ran up against the problem of unit testing the code. The solution was remarkable simple. Since we were using the QueryOver api, all we did was write a QueryOverStub implementation of the IQueryOver<TRoot, TSub>interface. Here's a simple implementation with unused methods not being implemented.


    public class QueryOverStub<TRoot, TSub> : IQueryOver<TRoot, TSub>
    {
        private readonly TRoot _singleOrDefault;
        private readonly IList<TRoot> _list;
        private readonly ICriteria _root = MockRepository.GenerateStub<ICriteria>();

        public QueryOverStub(IList<TRoot> list)
        {
            _list = list;
        }

        public QueryOverStub(TRoot singleOrDefault)
        {
            _singleOrDefault = singleOrDefault;
        }

        public ICriteria UnderlyingCriteria
        {
            get { return _root; }
        }

        public ICriteria RootCriteria
        {
            get { return _root; }
        }

        public IList<TRoot> List()
        {
            return _list;
        }

        public IList<U> List<U>()
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot, TRoot> ToRowCountQuery()
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot, TRoot> ToRowCountInt64Query()
        {
            throw new NotImplementedException();
        }

        public int RowCount()
        {
            return _list.Count;
        }

        public long RowCountInt64()
        {
            throw new NotImplementedException();
        }

        public TRoot SingleOrDefault()
        {
            return _singleOrDefault;
        }

        public U SingleOrDefault<U>()
        {
            throw new NotImplementedException();
        }

        public IEnumerable<TRoot> Future()
        {
            return _list;
        }

        public IEnumerable<U> Future<U>()
        {
            throw new NotImplementedException();
        }

        public IFutureValue<TRoot> FutureValue()
        {
            throw new NotImplementedException();
        }

        public IFutureValue<U> FutureValue<U>()
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot, TRoot> Clone()
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot> ClearOrders()
        {
            return this;
        }

        public IQueryOver<TRoot> Skip(int firstResult)
        {
            return this;
        }

        public IQueryOver<TRoot> Take(int maxResults)
        {
            return this;
        }

        public IQueryOver<TRoot> Cacheable()
        {
            return this;
        }

        public IQueryOver<TRoot> CacheMode(CacheMode cacheMode)
        {
            return this;
        }

        public IQueryOver<TRoot> CacheRegion(string cacheRegion)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> And(Expression<Func<TSub, bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> And(Expression<Func<bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> And(ICriterion expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> AndNot(Expression<Func<TSub, bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> AndNot(Expression<Func<bool>> expression)
        {
            return this;
        }

        public IQueryOverRestrictionBuilder<TRoot, TSub> AndRestrictionOn(Expression<Func<TSub, object>> expression)
        {
            throw new NotImplementedException();
        }

        public IQueryOverRestrictionBuilder<TRoot, TSub> AndRestrictionOn(Expression<Func<object>> expression)
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot, TSub> Where(Expression<Func<TSub, bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> Where(Expression<Func<bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> Where(ICriterion expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> WhereNot(Expression<Func<TSub, bool>> expression)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> WhereNot(Expression<Func<bool>> expression)
        {
            return this;
        }

        public IQueryOverRestrictionBuilder<TRoot, TSub> WhereRestrictionOn(Expression<Func<TSub, object>> expression)
        {
            return new IQueryOverRestrictionBuilder<TRoot, TSub>(this, "prop");
        }

        public IQueryOverRestrictionBuilder<TRoot, TSub> WhereRestrictionOn(Expression<Func<object>> expression)
        {
            return new IQueryOverRestrictionBuilder<TRoot, TSub>(this, "prop");
        }

        public IQueryOver<TRoot, TSub> Select(params Expression<Func<TRoot, object>>[] projections)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> Select(params IProjection[] projections)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> SelectList(Func<QueryOverProjectionBuilder<TRoot>, QueryOverProjectionBuilder<TRoot>> list)
        {
            return this;
        }

        public IQueryOverOrderBuilder<TRoot, TSub> OrderBy(Expression<Func<TSub, object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> OrderBy(Expression<Func<object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path, false);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> OrderBy(IProjection projection)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, projection);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> OrderByAlias(Expression<Func<object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path, true);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> ThenBy(Expression<Func<TSub, object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> ThenBy(Expression<Func<object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path, false);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> ThenBy(IProjection projection)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, projection);
        }

        public IQueryOverOrderBuilder<TRoot, TSub> ThenByAlias(Expression<Func<object>> path)
        {
            return new IQueryOverOrderBuilder<TRoot, TSub>(this, path, true);
        }

        public IQueryOver<TRoot, TSub> TransformUsing(IResultTransformer resultTransformer)
        {
            return this;
        }

        public IQueryOverFetchBuilder<TRoot, TSub> Fetch(Expression<Func<TRoot, object>> path)
        {
            return new IQueryOverFetchBuilder<TRoot, TSub>(this, path);
        }

        public IQueryOverLockBuilder<TRoot, TSub> Lock()
        {
            throw new NotImplementedException();
        }

        public IQueryOverLockBuilder<TRoot, TSub> Lock(Expression<Func<object>> alias)
        {
            throw new NotImplementedException();
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, U>> path)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<U>> path)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, U>> path, Expression<Func<U>> alias)
        {
            return new QueryOverStub<TRoot, U>(_list);
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<U>> path, Expression<Func<U>> alias)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, U>> path, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<U>> path, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, U>> path, Expression<Func<U>> alias, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<U>> path, Expression<Func<U>> alias, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, IEnumerable<U>>> path)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<IEnumerable<U>>> path)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, IEnumerable<U>>> path, Expression<Func<U>> alias)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<IEnumerable<U>>> path, Expression<Func<U>> alias)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, IEnumerable<U>>> path, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<IEnumerable<U>>> path, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<TSub, IEnumerable<U>>> path, Expression<Func<U>> alias, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, U> JoinQueryOver<U>(Expression<Func<IEnumerable<U>>> path, Expression<Func<U>> alias, JoinType joinType)
        {
            return new QueryOverStub<TRoot, U>(new List<TRoot>());
        }

        public IQueryOver<TRoot, TSub> JoinAlias(Expression<Func<TSub, object>> path, Expression<Func<object>> alias)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> JoinAlias(Expression<Func<object>> path, Expression<Func<object>> alias)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> JoinAlias(Expression<Func<TSub, object>> path, Expression<Func<object>> alias, JoinType joinType)
        {
            return this;
        }

        public IQueryOver<TRoot, TSub> JoinAlias(Expression<Func<object>> path, Expression<Func<object>> alias, JoinType joinType)
        {
            return this;
        }

        public IQueryOverSubqueryBuilder<TRoot, TSub> WithSubquery
        {
            get { return new IQueryOverSubqueryBuilder<TRoot, TSub>(this); }
        }

        public IQueryOverJoinBuilder<TRoot, TSub> Inner
        {
            get { return new IQueryOverJoinBuilder<TRoot, TSub>(this, JoinType.InnerJoin); }
        }

        public IQueryOverJoinBuilder<TRoot, TSub> Left
        {
            get { return new IQueryOverJoinBuilder<TRoot, TSub>(this, JoinType.LeftOuterJoin); }
        }

        public IQueryOverJoinBuilder<TRoot, TSub> Right
        {
            get { return new IQueryOverJoinBuilder<TRoot, TSub>(this, JoinType.RightOuterJoin); }
        }

        public IQueryOverJoinBuilder<TRoot, TSub> Full
        {
            get { return new IQueryOverJoinBuilder<TRoot, TSub>(this, JoinType.FullJoin); }
        }
    }

So you can easily specify the item or list of items you want the query to return in your test.

new QueryOverStub<Entity, Entity>(new List<Entity>())

The only thing missing is: if you're using projections or getting something like the count. In order to make this implementation work in those situations, we would just need to write implementations for the List<U> and SingleOrDefault<U> methods. In an upcoming blog post I'll talk about how to use an in memory Sqlite database as an alternative to using this stub.

Monday, November 28, 2011

Writing testable ETL processes with Rhino-ETL

On a recent project we had to integrate several external data sources into the application database. Some of these sources were csv files, while another one was an Oracle database. Our application, meanwhile, used a SQL Server database. Furthermore some of the data had to be loaded automatically, while other data had to be loaded by a business user. We tossed around several ideas for how to accomplish this: SSIS, Informatica, something custom built. We finally arrived on Ayende Rahien's (aka Oren Eini) ETL framework Rhino-ETL. It looked like it would fit the bill as it could be integrated into any .NET application and it allowed us to write unit and integration tests around it.

Unfortunately there is a dearth of information around how to use this framework. The only good piece of documentation is this great video tutorial. While a good starting point, I thought I'd write a blog to show how to get started with the framework.

Rhino ETL is based on a pipeline concept. Each process consists of a bunch of operations strung together. Each operation's output is fed into the next operation as input. The operations interact by the following method:

public interface IOperation : IDisposable
{
    ...
    IEnumerable<Row> Execute(IEnumerable<Row> rows);
    ...
}

So each operation takes in a collection of Rows and outputs a collection of Rows. A Row is basically an instance of Dictionary<object, object> with an added twist: no exceptions are thrown if a key does not exist. A Row will simply return null when a key does not exist in the dictionary. Each operation can also leverage the oft unused C# keyword of yield return. A typical operation might looks something like this:

public class MyOperation : AbstractOperation
{
    public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
    {
        foreach (var row in rows)
        {
            //process the row
            yield return row;
        }
    }
}

The yield return keyword instructs the compiler to generate an iterator that, for each row, would execute whatever code you had in the foreach statement. This allows us to defer execution of each row to the point in time when it passes through the operation pipeline.

Since your operations expose the Execute method, you can easily write unit tests around each operation, with any kind of data as input. Voila, by using this framework and developing a good suite of unit tests, your ETL process has become far more robust than if you were using a tool like SSIS.

The main thing to note about Rhino-ETL is that you will have to code all of your operations. Typically your operations will inherit from the AbstractOperation class. Some other useful operations:
SqlBulkInsertOperation
Set up the schema and do a bulk insert into a table. Useful for inserting large data sets quickly and efficiently
SqlBatchOperation
Batch your sql operations to reduce server roundtrips
BranchingOperation
Send rows to multiple operation pipelines
JoinOperation
Join your rows to a result set from another operation
PartialProcessOperation
Typically used with BranchingOperations to create an operation pipeline within a branch

A process is typically created by inheriting from the EtlProcess class.

public class MyProcess : EtlProcess
{
    protected override Initialize()
    {
        Register(new MyOperation());
        //register other operations
    }
}

A final note, in order to get any output from your etl process you will need to set log4net up. A simple console output can be created by the following entry in your App.config:

<log4net debug="false">
    <appender name="console" type="log4net.Appender.ConsoleAppender, log4net">
      <threshold value="WARN" />
      <layout type="log4net.Layout.PatternLayout,log4net">
        <param name="ConversionPattern" value="%d [%t] %-5p %c - %m%n"/>
      </layout>
    </appender>
    <root>
      <level value="INFO" />
      <appender-ref ref="console"/>
    </root>
  </log4net>

and by adding the following line of code to your application startup:

log4net.Config.XmlConfigurator.Configure();

To conclude, while you do give up fancy designers and useful operations like SSIS's Fuzzy Lookup, the benefit gained from being able to write unit tests around your ETL process can be invaluable. Due to its simplicity and the fact that it can be easily integrated into other .NET applications, Rhino ETL is a very useful tool to have in your arsenal.

Wednesday, July 13, 2011

Why you should use a DVCS

Over the last several years DVCSs (Distributed Version Control Systems) have gained enormous acceptance in the development community. There are a lots of good reasons to ditch your old Subversion (or god forbid CVS) repository and switch to one of the new kids on the block like Git or Mercurial. Let's take a look at some of these reasons.

Experiment to your heart's content

I'm sure this has happened to everybody. You start developing a feature. You change a few minor things here and there (e.g. add a new property or method to an entity). Then you make a few more changes; and a few more; and a few more. All of a sudden you realize that your approach just isn't going to work. Hold on though, some of the changes you made earlier are still valid and you want to keep them. So you start trying to revert just the later changes, only to give up and revert everything back to the latest copy that was in source control. Begrudgingly you reproduce your earlier work as you curse your source control system.

If you're using a DVCS, all of that can be avoided. Since you have a source control repository right on your own machine, tracking your code. You can easily reset your code to one of the commit points along your journey to the dead end you've found yourself in. You can even save your progress in a tag or a branch, in case you want to refer to some of the code in the future.

Branch and merge with ease

A common scenario in software development is to do a product release, create a release branch, and continue development on the main line trunk. If a bug is reported in the release, you would check out the release branch, fix the bug and check in your changes. After a few of these changes you can do another release and merge your branch fixes into your main line trunk. At least that's how it's supposed to work. What happens more often is on the last step, when you try to do a merge, you find so many conflicts that it then takes the entire team a day or more to resolve them.

Since a DVCS is built around branching and merging, these operations are not only easy to do, but also painless. Most of the time you won't even need to manually merge anything as the systems are very good at figuring out what was intended. So go ahead, make as many branches as you want, merge them whenever you want, and stop trying to mangle your process with the limitations of your VCS.

Safety and redundancy

One of the great things about having multiple copies of the repository floating around among the development team and the central location is redundancy. In a traditional central version control system environment, if your your repository became corrupted, or the server hard drive failed, you would have to go find a backup and restore it, possibly losing work and holding up development.

In a distributed environment, everyone has a copy of the repository. So if the same scenario were to happen, the fix can be as simple as one of the developers pushing in the latest copy to a new shared location. Alternatively one developer's repositories can be chosen to be a central repository until the server can be rebuilt. That is not to say you shouldn't have backups in a distributed environment, but it is far less catastrophic not to have them.

The bottom line is that constantly working with the benefits of source control far outweigh the initial pain of converting to a DVCS. If you're developing in Windows, I highly recommend Mercurial, if Linux or Mac is your preferred environment, then Git is definitely the way to go. So go on and give it a try. I promise you won't go back to your old VCS.

Monday, May 30, 2011

Git with ssh on Windows

While Git is a great source control system that can bend to almost any source control workflow you might have, support on windows varies from awesome (Git Extensions, msysgit) to downright awful (using git, ssh, or http protocol). I've struggled enough with setting up an ssh server on windows to host a central repository to warrant documenting it.

First you will need to install Cygwin with the openssh and git modules enabled.

After the installation we'll need to let cygwin know about the domain accounts we want to use. For this we'll use the mkpasswd command. For domain accounts use the '-d' option.

$ mkpasswd -d -u username >> /etc/passwd

This will append the entry to the /etc/passwd file. Do this for all of the accounts you want to add. You may need to subsequently edit the /etc/passwd file and set the appropriate home directory to /home/username especially if your company likes to have a network home drive for all of their users.

Now it's time to create a new bare repository.

$ git init --bare myrepo.git
$ git config core.sharedRepository group

The second command lets git know that it should use group write permissions when creating files and directories. We'll also want to set the group permissions to special so that newly created files and folders have the same group permissions.

$ chmod -R g+ws .

Now that your repository is set up. Let's setup the ssh server.

$ ssh-host-config

The script will begin by generating host key files and config files. It will then prompt you to use privilege separation. Answer yes and again when it prompts you to create an unprivileged user.

Next the script will ask if you want to install sshd as a service. Answer yes. The next question will tell you that you need a privileged account to run the service and if you want to use the name 'cyg_server'. Either say no, or if you already have an appropriate privileged account answer yes and enter the name you want to use. If the account you entered does not have the appropriate permissions the script will warn you. I recommend letting the script create a 'cyg_server' account. When prompted enter an appropriate password that conforms to your password policies.

Finally the script will ask you to enter the values for the CYGWIN environment variable. Enter 'tty acl'. This will let you use windows security to access the file system and insert appropriate characters.

After all of that, ssh will be configured so we just need to start it.

$ cygrunsrv --start sshd

Alternatively you can go into the services list, find CYGWIN sshd and start it there.
If you start seeing errors at this point there is likely something wrong with the password you set up for the 'cyg_server' account.

As a final step you will likely want to create a symlink to your repository in the root folder. This will allow the ssh URI to be more concise and friendly.

$ ln -s /path/to/your/repo /repo

Once you have the ssh server running, it's time to connect to it. Let's push our local repository to the remote.

$ git push ssh://server/repo

If you did not set up a symlink your address will need to contain the appropriate cygwin path to your repo starting from /. The first time you connect it will prompt you to add the host key to your known_hosts file. Finally you will be prompted you to enter your password. At this point you have everything in place to access the repository remotely.

As a final step you can set up public/private keys in order to identify yourself to the ssh server. This will allow you to forgo having to enter your password. First, set up your ssh keys

$ ssh-keygen -t rsa

This will create 2 files in your ~/.ssh drive, one is your public key (id_rsa.pub) and the other is your private key (id_rsa). The process will involve you copying the contents of your public key to a /home/username/.ssh/authorized_keys file on the server. One way of doing this is to create a share to your home drive on the server that only your account can access, then you can create an authorized_keys file and copy the contents in notepad.

Git can still be a pain to work with on Windows, but hopefully this will make your life a little easier.

Update:
If you would rather set up git via http protocol. Have a look at this post if you want to go via Apache. Or for an IIS solution check this out.