Sorting topologically for friendlier configuration
I’ve been working on the bootstrap mechanism of service. One of the problems I’m addressing is the relative fragility of its configuration mechanism. Some services depend on other services (e.g. a “connection handler” service depends on both “authentication” and “database access” services). With the old configuration mechanism, the order these services were specified in the configuration file mattered. The bootstrap mechanism would always start them in the order specified, which meant several possible configuration files would result in a broken system since some services may end up started before the services they depend on. Since I’m making the configuration file less fragile, I’m making the new bootstrap system always start services after their dependencies. The dependency information is a directed acyclic graph — there can be no cycles of dependencies. It topologically sorts the services to find the start order.
A brief, contrived example implementation of this:
class Service: depends = () class AService(Service): depends = () class BService(Service): depends = (AService,) class CService(Service): depends = (BService,) class DService(Service): depends = (AService,) class EService(Service): depends = (CService, DService) class FService(Service): depends = () def toposort(services): sorted =  def visit(service): if service in sorted: return for depend in service.depends: visit(depend) sorted.append(service) for service in services: visit(service) return sorted def show(func, services): print func.__name__ for service in func(services): print 'Start: ',service.__name__, ' depends on ',', '.join([s.__name__ for s in service.depends]) services = [FService, EService, DService, CService, AService] show(toposort, services) ---- toposort Start: FService depends on Start: AService depends on Start: BService depends on AService Start: CService depends on BService Start: DService depends on AService Start: EService depends on CService, DService
A simple way to sort the nodes in a DAG is using a depth first
traversal. In this example, if the graph is too deep we’ll blow the
stack. In my install of Python,
sys.getrecursionlimit limit is
1000. My own stack would pop long before Python’s if the depth of
our dependency graph was the many hundreds it would take to blow
the stack with the obvious recursive implementation.
But, I’ll go ahead and show a version that doesn’t use recursion anyway:
According to some testing, toposort2 is much slower than toposort:
Of course, a real version of this would also need to watch for cycles and report them in a way more friendly than blowing the stack (toposort) or running until it ran out of memory or was killed (toposort2).
toposort2 could be made to run faster, but I’ll save the “making Python code faster” effort for situations when it matters. If your algorithm is naturally expressed recursively it is best to express it recursively unless there’s really a compelling reason to do otherwise.