Tag: Softwre development

  • AI Tooling – Making The Scheduler

    It’s Reimplemented Cron!

    My WhatsApp Game bot needs scheduled tasks. At first it’s database cleanup, to clear down expired mementos and conversation state. I’ll add game completion and tidying when those features are implement. The task for the AI was “Implement Scheduling”.

    It’s written a pretty comprehensive Cron implementation which is database backed.

    Would I have done this? 20 years ago working on AIX systems I’d have used Cron. Today, professionally, working with Kubernetes I’d look at Kubernete’s scheduler. It could be a quick and dirty thing to add a cron-job – every 15 minutes

    DELETE FROM mementos WHERE expires_at < NOW()

    I challenged Claude on this, and it did have a valid point. Having one binary gives me easy deployment in my early dev days. I can split out scheduling later when I need to, but for now I can push a container image to my dev server and everything will update. Management is easier.

    The mementos will likely move to REDIS, maybe something I should have already done (I was concerned with the cost of REDIS/Elasticache if I used AWS for this, but have a local dev server for now). Tasks like game ending are not so easy to do in Cron without producing a purpose made binary from the Go code.

    It did it in 15 minutes

    The scheduler took about 15 minutes to write. Interestingly it’s used a Store Pattern rather than bring GORM into the mid-tier (see my earlier post on domain layer design). This could be because it had a previous attempt as reference. This is quite fast.

    I had to ask for concurrency (and other things)

    The first version had single shot scheduled tasks which would reschedule themselves. This seemed flaky so I asked for recurring job support. The code was adapted and a call was added to main to create the jobs. This would result in a new set of jobs being added every time the program ran. It’s a bug that would be spotted if paying attention to the database, otherwise if this had been Vibe Coded then performance would have degraded over time. I asked Claude to implement an idempotent startup and to reinforce it with a database constraint. The database constraint is keyed on Job Type and a Uniqueness Key, allowing me to scope jobs.

    Claude does tell me that job taking is concurrent safe. It uses Postgres’ SKIP LOCKED feature which is a great tool.

    I need to manually review the code. Claude has run out of tokens for this morning so I have time (until the family wake up) to look at it. I’ll check that the job taking is a short transaction and state management is used. I’ll need to understand how failed jobs are handled.

    It did the tests while I went for a run

    A nice thing for the hobbyist – I left it writing the end to end tests for the system while I went for a run. This could be why it’s run out of tokens! I’ve been manually testing so far. I want to refactor the command routing infrastructure to allow me to pass continuation state to the AI Agent code and provide the AI Agent with tools to ask the user questions.

    Footnote

    On review of the code: There’s a fair amount of duplication in there – two almost identical registry classes for example. I also think it won’t work so well on single-shot jobs when I need them. I can fix these things over time, but I have more valuable things to spend time on now. I’ll ask Claude to factor the 70 or so lines of initialisation code out of main.go then move on.

  • Experiments in Go – Three Tier Architecture in a WhatsApp Game-Bot

    I am currently developing a game-bot which uses WhatsApp as its user interaction layer. This bot will implement a Scout widegame that we play in an urban environment. We currently play over WhatsApp with each team in a WhatsApp group with the leaders. The leaders respond to messages from the teams and track the game on paper – a slow and error prone system. The last game we played had me sitting in McDonalds on a Tuesday night quickly putting together a spreadsheert to track progress and keep score while the Scouts ran around (and sometimes sheltered from the rain) outside.

    The Three Tier Architecture

    The system I’m writing uses multiple presentation layers. It has a WhatsApp textual interface, a web interface (for game organisers) and maybe one day a mobile app. I’m finding that, with the help of AI Agents for natural language input, the WhatsApp interface is most promising at the moment. This already presents two presentation layers. My interface supports both “slash-commands” and natural language.

    I already need to separate business logic such as game actions from the presentation layer. My presentation layer deals only with taking instruction from the users and presenting the results. My business layer implements things like “What happens when a player reaches the finish?”

    The Domain Layer provides data access. It provides functions like “Create a player record”, “Update the score”. It presents the database in a more conceptual business language so that the business layer can be coded in terms of business concepts. Dependencies in this structure are strictly downwards. Each layer knows only about the layers below it.

    One question in this type of architecture is “Do we allow the presentation layer to directly access the domain, missing out the middle tier?” More modern systems I have used do this. It increases coupling between the layers, but avoids marshalling of data across layers. The domain layer and business layer are in the same codebase at the moment so this coupling can be made. I am allowing this coupling for simple read operations. Things would have to change if I ever move to a microservices architecture, but I expect the impact to be small.

    I will focus on my Domain Layer decisions in this post. My Mid-Tier and Presentation Layer are large topics in their own right and deserve their own series of posts. For context I am using a Command Pattern in my mid-tier to separate transaction lifecycle from business logic and allow composition of reusable business logic “commands”. Method signatures in the domain layer reflect this design. I started with Command Classes (traditional Object Orientated programming) but have moved to using Functions (lambdas), a nice feature of more modern languages.

    Attempt One – Stateless Stores, Raw SQL

    My initial pattern was based in many ways on a project I worked in in the early 2000s to help a large car manufacturer to manage its dealer network. That system had four or five main objects and a series of “Use Cases” that worked over them. The Use Cases were the Commands in the business logic layer. It was a project that drove home the reason that database tables use “meaningless” keys. The dealer’s own systems were keyed on a “Dealer Code” which was a short string of which the first character indicated the type of dealer. They restructured their dealer network so causing huge difficulties for their systems. Our system used surrogate keys and stored Dealer Name and Dealer Type as properties, so all we needed to do was accept the new data.

    My domain layer was generated quite quickly using Claude Code from a textual description of the conceptual model. Code generation of any kind fundamentally changes the cost of building domains. The car dealer project used a PERL script to generate the entire domain layer codebase from short text files with table definitions. If anything, that PERL script was more consistent than the AI solution, but also less flexible. The variation in the Claude code (even within one domain object) means that I have to pick an example. My refactor (Attempt Two below) has me paying a lot more attention to Claude’s output.

    Each domain object has its own package, for example “src/domain/game”. The domain object is in game.go. Types were used for enums:

    package game
    // ............
    
    // GameStatus represents the lifecycle state of a game
    type GameStatus string
    
    const (
    	GameStatusCreating   GameStatus = "creating"   // Configuration only, admin setting up
    	GameStatusJoining    GameStatus = "joining"    // Registration open, teams can join
    	GameStatusActive     GameStatus = "active"     // Game is live and being played
    	GameStatusCompleting GameStatus = "completing" // Game time ended, late penalty phase
    	GameStatusCompleted  GameStatus = "completed"  // Game finished, read-only
    )
    

    The domain object itself is a struct. Here are some of the fields. Note the use of sql types to handle NULLs. This is something I found significant in my GORM refactor.

    // Game represents a single instance of a game with configured rules and lifecycle
    type Game struct {
    	ID                       int64
    	Title                    string
    	GameCode                 string
    	AdminPasswordHash        string
    	Status                   GameStatus
    	GameType                 GameType
    	CreatedAt                time.Time
    	EndTime                  sql.NullTime
    	StartTime                sql.NullTime
    	JoiningWindowStart       sql.NullTime
    	CompletedAt              sql.NullTime
    	JoiningPasscode          sql.NullString
      // ......
    }

    The file store.go contains the interface definitions of methods to access the store. The domain layer is purely responsible for data access, so business logic like choosing a game code is performed by the business layer. The game creation method took a subset of the fields in a Params object. Another method, for example to change the game state, would take only the parameters it needs.

    // GameCreateParams contains parameters for creating a new game
    type GameCreateParams struct {
    	Title                    string
    	GameCode                 string
    	AdminPasswordHash        string
    	JoiningPasscode          string    // Plain text passcode for joining
    	GameType                 GameType  // Defaults to score_attack if empty
    	CreatedByUserID          uuid.UUID // ID of user creating the game
    }

    Some of the methods here crosscut database objects. For example I placed the join table for Game Controllers (administrators/organisers) in the Game domain. This can be seen in the AddController and IsController methods on the game.Store interface

    // Store defines the interface for game data access operations
    type Store interface {
    	// Create creates a new game and returns it with the generated ID
    	Create(ctx *cmdcontext.CommandContext, params GameCreateParams) (*Game, error)
    
    	// GetByID retrieves a game by its ID
    	GetByID(ctx *cmdcontext.CommandContext, id int64) (*Game, error)
    
    	// GetByGameCode retrieves a game by its game code
    	GetByGameCode(ctx *cmdcontext.CommandContext, gameCode string) (*Game, error)
    
    	// Update updates a game's settings
    	Update(ctx *cmdcontext.CommandContext, id int64, params GameUpdateParams) (*Game, error)
    
    	// UpdateStatus transitions a game to a new status
    	UpdateStatus(ctx *cmdcontext.CommandContext, id int64, newStatus GameStatus) error
    
    	// AddController adds a user as a game controller (admin)
    	AddController(ctx *cmdcontext.CommandContext, gameID int64, userID uuid.UUID) error
    
    	// IsController checks if a user is a controller for the given game
    	IsController(ctx *cmdcontext.CommandContext, gameID int64, userID uuid.UUID) (bool, error)
    }

    A mock implementation allowed test code to run against an in-memory store. This was a nice feature and easily assembled using Claude. This made unit tests for higher level code fast. Here’s the mock method to update a game status

    // UpdateStatus transitions a game to a new status
    func (m *StoreMock) UpdateStatus(ctx *cmdcontext.CommandContext, id int64, newStatus GameStatus) error {
    	m.mu.Lock()
    	defer m.mu.Unlock()
    
    	game, exists := m.games[id]
    	if !exists {
    		return ErrGameNotFound
    	}
    
    	game.Status = newStatus
    	return nil
    }

    The postgres.go file contained the Postgres implementation. For some reason the CodePro formatting plugin in worpress translates != to its mathematical symbol!

    // UpdateStatus transitions a game to a new status
    func (s *StorePostgres) UpdateStatus(ctx *cmdcontext.CommandContext, id int64, newStatus GameStatus) error {
    	query := `UPDATE games SET status = $1, updated_at = NOW() WHERE id = $2`
    
    	result, err := ctx.DB.ExecContext(ctx.Context(), query, newStatus, id)
    	if err != nil {
    		// Infrastructure error - wrap with context
    		return fmt.Errorf("update game status %d: %w", id, err)
    	}
    
    	rowsAffected, err := result.RowsAffected()
    	if err != nil {
    		// Infrastructure error - wrap with context
    		return fmt.Errorf("check rows affected for game %d: %w", id, err)
    	}
    
    	if rowsAffected == 0 {
    		// Domain error - game not found
    		return ErrGameNotFound
    	}
    
    	return nil
    }

    The stores are stateless and could be singleton. To allow testing a central Stores object was used to hold references to all of the Stores interface. This was passed to the higher level components through constructor injection. Components declared an interface with just the stores that they need. Go’s implicit interface mathcing allowed the large central StoreProvider to satisft it, or the smaller test specific provider. This made testing easy throughout the system. It was even possible to perform quite complex end to end tests with the mock stores.

    There is a lot of boilerplate code in this solution, but AI code generation makes this easy. The risk is inconsistent code. I wrote this using the special offer of free Claude Code tokens in a weekend, at times giving Claude Web instructions on my phone on a bus journey with my family. Production code needs a closer watch. The use of AI generation meant that some subtleties were missed, for example the detection of partial updates in the Update struct. This lead to bugs.

    Attempt 2 – Object Relational Mapping with GORM

    Left with a non-functional system and a lot of AI generated code, I thought I’d start again. I tore out the old domain layer and moved the higher layer logic to a side directory. I left a lot of the framework code intact, but am refining it as I go use case by use case with the care that is really needed.

    The use of GORM can reduce the domain layer to just models. This allowed me to make a flat domain package. Here’s the game structure (or a subset of it) in Gorm. The enums remain the same as before.

    type Game struct {
    	ID                             int64      `gorm:"primaryKey;autoIncrement" json:"id"`
    	Title                          string     `gorm:"type:varchar(255);not null" json:"title"`
    	GameCode                       *string    `gorm:"type:varchar(8);uniqueIndex" json:"game_code,omitempty"`
    	AdminPasswordHash              *string    `gorm:"type:text" json:"-"` // Exclude from JSON
    	Status                         GameStatus `gorm:"type:game_status;not null;default:'creating'" json:"status"`
    	GameType                       GameType   `gorm:"type:game_type;not null;default:'score_attack'" json:"game_type"`
    	JoiningOpensAt                 *time.Time `gorm:"type:timestamptz" json:"joining_opens_at,omitempty"`
      // ......
     	InitialScore                   int        `gorm:"not null;default:0" json:"initial_score"`
      // ......
      
    	// Relationships
    	Creator         *User    `gorm:"foreignKey:CreatedBy" json:"creator,omitempty"`
    
    	// One-to-many relationships
    	Controllers []GameController `gorm:"foreignKey:GameID" json:"controllers,omitempty"`
    }
    
    // TableName specifies the table name
    func (Game) TableName() string {
    	return "games"
    }

    A nice feature of GORM is that it provides hook functions, so I can always ensure that the Last Modified timestamp is maintained.

    // BeforeUpdate hook to update the updated_at timestamp
    func (g *Game) BeforeUpdate(tx *gorm.DB) error {
    	g.UpdatedAt = time.Now()
    	return nil
    }

    I found myself writing helper functions in the domain layer (in this case in game.go). This is putting strain on the flat domain package structure, so I may at some point break it out again to a package for each domain area. For example I have derived values

    // CanAcceptPlayers a game can accept players once joining is opened up until
    // the game ends.
    func (g *Game) CanAcceptPlayers() bool {
    	return g.Status == GameStatusJoining || g.Status == GameStatusActive
    }

    My helper methods to manage game controllers are now in the game_controller.go domain file, the table that they operate over.

    type AddGameControllerParams struct {
    	GameID       int64
    	UserID       uuid.UUID
    	IsOwner      bool
    	AccessSource AccessSource
    }
    
    func AddGameController(db *gorm.DB, params AddGameControllerParams) (*GameController, error) {
    	gc := &GameController{
    		GameID:       params.GameID,
    		UserID:       params.UserID,
    		IsOwner:      params.IsOwner,
    		AccessSource: params.AccessSource,
    	}
    	return gc, db.Create(gc).Error
    }
    
    // IsUserControllerOfGame checks if a user is a controller (admin) for a specific game
    func IsUserControllerOfGame(db *gorm.DB, gameID int64, userID uuid.UUID) (bool, error) {
    	var count int64
    	err := db.Model(&GameController{}).
    		Where("game_id = ? AND user_id = ?", gameID, userID).
    		Count(&count).Error
    	if err != nil {
    		return false, err
    	}
    	return count > 0, nil
    }

    My business layer uses these and direct GORM methods. This is a snippet of src/commands/admin/new_game.go (coloured differently to denote a different layer).

    	// Create the game
    	newGame := &domain.Game{
    		Title:                    req.Title,
    		GameCode:                 &gameCode,
    		AdminPasswordHash:        &passwordHash,
    		Status:                   domain.GameStatusCreating,
    		GameType:                 domain.GameTypeScoreAttack,
    		InitialScore:             0,
    		JoiningPasscode:          &joiningPasscode,
    		CreatedBy:                *req.UserId,
    	}
    
    	// Use Select to explicitly include InitialScore even though it's zero
    	result := ctx.GormDB().Select("Title", "GameCode", "AdminPasswordHash", "Status", "GameType", "InitialScore", "JoiningPasscode", "CreatedBy").
    		Create(newGame)
    	if result.Error != nil {
    		return nil, fmt.Errorf("create game: create game: %w", result.Error)
    	}
    
    	_, err = domain.AddGameController(ctx.GormDB(), domain.AddGameControllerParams{
    		GameID:       newGame.ID,
    		UserID:       *req.UserId,
    		IsOwner:      true,
    		AccessSource: domain.AccessSourceCreator,
    	})
    	if err != nil {
    		return nil, fmt.Errorf("create game: set owner: %w", err)
    	}
    
    	// Update user's current context to switch them to administering this game
    	err = ctx.GormDB().Model(&domain.User{}).
    		Where("id = ?", req.UserId).
    		Updates(map[string]interface{}{
    			"current_game_id": newGame.ID,
    			"current_context": domain.UserContextAdministering,
    		}).Error
    	if err != nil {
    		return nil, fmt.Errorf("create game: update user context: %w", err)
    	}

    It is here that I found my first subtelty of GORM. It uses the empty value in create and save operations to determine whether or not a field is affected. For this reason if you try to set a team’s score to 0 calling Save it will not work. The code has to explicity request that field be updated, either using Select() or Updates().

    It is possible to set the field to always save, but then it would be far too easy to set a team’s score to 0 by not including the current score in an unrelated update. In fact in a concurrent system it would be dangerous to write things that are not needed. I’ll discuss concurrency below as it’s an interesting topic. Another option is to use a pointer in the structure. This makes the fieled nillable which is conceptually wrong. The end result is this more explicit code.

    I have yet to see how unit tests work in this GORM based system. Claude is writing tests and talks of a GORM testing framework, so I’ll look forward to seeing how this works.

    Concurrent Updates

    Part of my reason to prefer raw SQL was the need to control concurrent updates to team scores. Team score update is an example of a Read-Calculate-Write operation, so faces a race condition if two or more processes attempt it at the same time. A player action can affect the scores of multiple teams, so with multiple players running around town affecting their own and other teams’ scores I had to make this code safe.

    The answer is to take out a database lock. The update code has to be fast so that the lock is not held long. The scope of the lock is the affected teams. This reduces the chance of collision however there is a deadlock risk. If the code locks the player’s own team record then adds in other teams as knock-on effects are calculated then A can lock A then B. At the same time B wants to lock B then A. The system fails if A and B take their own team locks out at the same time, then are waiting on eachother for their second locks. The answer is to work out the set of affected teams at the start then take out locks in ID order. If locking across tables or any other resource, it is important to always use the same order for all processes, so User then Team for example.

    Another type of update I call the Blind Write (possibly based on the term Blind Update in stage lighting). All I care about is that when my transaction commits the player is called Fred. There’s no maths, no reading, just "UPDATE player SET name="Fred" WHERE id=:playerId“. I don’t need a lock in this example. Last to commit wins. For this to work I must only update the fields that I want to set the value of. If I were to read the whole record, make changes, then save the whole record then I introduce a Read-Calculate-Write race condition, even though “Calculate” in this case was “Do nothing”.

    What I have learned so far

    This is a big refactor. It’s going to take time, even with AI tooling (especially now the free credit offer is gone and I’m consrtrained by usage limits). The refactor would have been cheaper if tried sooner. The rush to get a prototype working in about a weekend meant that I had a large system that didn’t work well at all. It also meant that I had something to show the other Scout Leaders that at least demonstrated the idea of what I was making. Refactoring is allowing me to bring back subsystems and functionality slowy, with care, and step by step manual testing. It will be better for it.

    AI does change the landscape by enabling rapid prototyping and iteration. There is an interesting question of how much I should strive for what I see as code perfection versus accepting the AI’s output. What I’m looking at now is not the way I’d have done it, but is it bad enough to demand a rewrite? This is a question I’ve faced so many times working with humans! I tend to ask pointed questions, something I’m continuing with the AI. “Explain this situation…..”. The AI has handled it, but not where I expect.

    AI approach:
    
    * Read Pending Action Fields
    * Are they invalid?
       * **Clear the fields**
    * Is there a pending action?
       * **Clear the fields**
       * Is the user starting a new command?
           * Do normal routing instead
       * else handle pending action
    * else do normal routing
    My Approach:
    
    * Read Pending Action Fields
    * Are they set?
      * **Clear the fields**
    * Is the user starting a new command
      * do the command
    * else-if there's apending action
      * do the action

    I have to say, having typed out the above, I’ll rework it. My way is simpler and always clears down the fields, a fundamental contract in this case.

    It’s expensive for a team leader to ask a junior member to rework something, especially if that work has gone all the way to code review before being checked. I’ve mitigated this in the past by providing support for new developers and having technical planning meetings before starting work. Tools like SpecKit formalise that with AI, and I have my own “SpecKit Lite” on this project. The expense with AI is the token cost, something quite notable with a personal account. Rework is an opportunity cost in either case, but the cost is lower with AI.

    Conclusion – Which Pattern?

    The system will be greatly improved not because of this refactor but because I am now working through slowly implementing features and checking as I go. This is more real coding than the vibe coding of the weekend with Claude Web.

    I think the differences between approaches are marginal. Modern AI tools make creating boilerplate easy so this is no longer a cost. My past use of the raw SQL Store pattern had me using PERL scripts to make the boilerplate which was even more reliable than AI.

    One test will come soon enough when I move short lived data to REDIS. The Store based system would make this easy – use a REDIS backend for those Stores. The refactor may be larger having exposed GORM to my middle tier, but again given AI not too large. The prime candidate is my Memento Store, which already is accessed via CreateMemento and GetMemento methods.