Go: Build your own linter

Posted at — Jan 17, 2024

Go provides rich support for lexical analysis, parsing and type checking of a go package. Using these tools, we can create our own linter to detect any issue or perform some refactoring.

To make matters easier, go tools module provides analysis package with which we can create linter or static analysis without manual parsing or loading packages. The analysis package provides a nice API to write the business logic of our linter or static analysis and test them effectively.

A simple linter

Let’s build a simple linter that will detect a violation of the naming convention in go. As per documentation, The convention in Go is to use MixedCaps or mixedCaps rather than underscores to write multiword names. So, this linter will catch usage of underscore in variable naming like mixed_caps.

Before diving into the code, we need to find out for which cases we will issue the warning. We need to issue the warning where a variable is being declared. Now, a variable declaration may look like this:

func _() {
    var (
        x = 10  // (1)
    )
    var y int   // (2)
    z := 10     // (3)
}

According to go/ast, the variable declaration in (1) and (2) is a DeclStmt and the short variable declaration in (3) is an AssignStmt

What is a DeclStmt?

// A DeclStmt node represents a declaration in a statement list.
DeclStmt struct {
    Decl Decl // *GenDecl with CONST, TYPE, or VAR token
}

// A GenDecl node (generic declaration node) represents an import,
// constant, type or variable declaration. A valid Lparen position
// (Lparen.IsValid()) indicates a parenthesized declaration.
GenDecl struct {
    Doc    *CommentGroup // associated documentation; or nil
    TokPos token.Pos     // position of Tok
    Tok    token.Token   // IMPORT, CONST, TYPE, or VAR
    Lparen token.Pos     // position of '(', if any
    Specs  []Spec
    Rparen token.Pos // position of ')', if any
}

A DeclStmt represents a declaration and the field Decl is of Decl type which is an interface. It is implemented by *GenDecl and the GenDecl type contains Spec which is also an interface. The Decl interface is also implemented by other types like FuncDecl (represents a function declaration) which is not relevant to this case.

// A ValueSpec node represents a constant or variable declaration
// (ConstSpec or VarSpec production).
ValueSpec struct {
    Doc     *CommentGroup // associated documentation; or nil
    Names   []*Ident      // value names (len(Names) > 0)
    Type    Expr          // value type; or nil
    Values  []Expr        // initial values; or nil
    Comment *CommentGroup // line comments; or nil
}

The interface Spec is implemented by *ValueSpec. A ValueSpec node represents a const or variable declaration and the Names field contains all the identifiers in a declaration which is of *ast.Ident type. So, for this linter, it is enough to check the identifiers in a ValueSpec node.

Now, let’s check what is an AssignStmt.

// An AssignStmt node represents an assignment or
// a short variable declaration.
type AssignStmt struct {
	Lhs    []Expr
	TokPos token.Pos   // position of Tok
	Tok    token.Token // assignment token, DEFINE
	Rhs    []Expr
}

An AssignStmt represents statement like x = 1 or x := 1. For this linter, we are interested in x := 1 and check AssignStmt with Tok having value of token.DEFINE i.e. :=. Here, DEFINE is a constant of type Token declared inside go/token package. Also, the Lhs fields is a slice of Expr which is an interface and it is implemented by many expression types. We are only interested where the Expr is an identifier.

So, to summarize we will check

all the identifiers in a ValueSpec node (represented by Names field)
all the left hand side identifiers in a AssignStmt node (represented by Lhs field where the type of the field is *ast.Ident)

The code

First, we will declare a variable of type Analyzer. According to the documentation of Analyzer,

An Analyzer statically describes an analysis function: its name, documentation, flags, relationship to other analyzers, and of course, its logic.

So, the Analyzer contains some metadata of the linter and has a function that will contain the business logic.

var Analyzer = &analysis.Analyzer{
	Name:     "varname",                              // (1)
	Doc:      "Check snake case variable naming",     // (2)
	Run:      run,                                    // (3)
	Requires: []*analysis.Analyzer{inspect.Analyzer}, // (4)
}

func run(pass *analysis.Pass) (interface{}, error) { // (5)
    // logic of the linter
}

(1) The name of the linter is varname. This must not be empty and must be a valid identifier.
(2) A helpful doc should be added. This must not be empty.
(3) run function contains the actual logic of the linter. The signature of Run field is func(*Pass) (interface{}, error). So, run takes an *analysis.Pass as an argument and returns a result on success. If the linter returns a result the type of the result should be assigned to ResultType field. So, what is the use of this result? If any other analyzer depends on this analyzer, it can use the result produced by this analyzer.
(4) The list of analyzers that must run successfully before the linter. In this case, it is inspect.Analyzer. Now why do we require it as a dependency? Because our analyzer needs to check the ASTs and the inspect.Analyzer returns an Inspector. With the Inspector we can do a preorder traversal on the ASTs and apply the logic of our analyzer.
(5) run will contain the business logic of the analyzer.

Now, let’s write the actual logic of the analyzer.

func run(pass *analysis.Pass) (interface{}, error) { // (1)
	anInspector := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector) // (2)

	nodeFilter := []ast.Node{ // (3)
		(*ast.AssignStmt)(nil),
		(*ast.ValueSpec)(nil),
	}
	anInspector.Preorder(nodeFilter, func(n ast.Node) { //(4)
		switch n := n.(type) {
		case *ast.ValueSpec: // (5)
			for _, id := range n.Names {
				if isSnakeCase(id.Name) {
					pass.ReportRangef(n, "avoid snake case naming convention")
				}
			}

		case *ast.AssignStmt: // (6)
			if n.Tok == token.DEFINE {
				for _, lhsExpr := range n.Lhs {
					if id, ok := lhsExpr.(*ast.Ident); ok && isSnakeCase(id.Name) {
						pass.ReportRangef(n, "avoid snake case naming convention")
					}
				}
			}
		}
	})

	return nil, nil
}

func isSnakeCase(s string) bool {
	return s != "_" && strings.ContainsRune(s, '_') // (7)
}

(1) pass which is of type *analysis.Pass contains all the relevant information necessary for the analyzer like all the syntax trees of a package, type information, results of the prerequisite analyzers and so on. It also provides some methods like ReportRangef to generate a warning on a node. For this linter, we need the ResultOf field of the pass object to get the result of the prerequisite analyzer.
(2) pass.ResultOf returns the result from the required analyzer inspect.Analyzer which takes all the syntax trees as argument and returns an *inspector.Inspector object. The *inspector.Inspector object provides some helpful methods to traverse the ASTs like (*inspector.Inspector).Preorder.
(3) We decided to check the ValueSpec and AssignStmt node for this analyzer.
(4) Preorder runs preorder traversal for each AST and for each node of the type mentioned in nodeFilter invokes the provided function.
(5) For each ValueSpec, the analyzer checks whether any of the assigned identifiers has snake case naming.
(6) For each AssignStmt, the analyzer checks if it is a short variable declaration and checks whether any of the assigned identifiers has snake case naming.
(7) Here, s != “”_ is added to skip warning for blank identifier like x, _ := f().

Adding more functionality

Now, let’s make some improvements to prevent some unwanted warnings. First, we want to prevent the analyzer from running on an auto-generated file as the auto-generated file often contains variables with snake case and we do not want to modify it.

A generated file contains comment like this

// Code generated by "stringer -type=SomeType"; DO NOT EDIT.

package foo

So, to check if a file is generated or not, we need to check the comments in the file. Comments in a go file are kept under the root element of the AST which is *ast.File. From the doc,

type File struct {
	...
	Comments           []*CommentGroup // list of all comments in the source file
	GoVersion          string          // minimum Go version required by //go:build or // +build directives
}

So, we need to check the Comments field in the *ast.File. But how can we get the *ast.File? If we check the function in the Preorder method, it just provides an ast.Node and there is no parent or ancestor information associated with it to find the root element which is the *ast.File node.

anInspector.Preorder(nodeFilter, func(n ast.Node) {
	// ...
})

Fortunately, the *inspector.Inspector provides another method WithStack which contains the current traversal stack in the parameter stack. The first element of the stack is an *ast.File node.

anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
	// ...
})

let’s modify the analyzer with the WithStack method.

func run(pass *analysis.Pass) (interface{}, error) {
	
	...

	anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
		if isGeneratedFile(stack[0]) { // (1)
			return false
		}
		switch n := n.(type) {
			...
		}

		return true
	})

	return nil, nil
}

var generatedCodeRe = regexp.MustCompile(`^// Code generated .* DO NOT EDIT\.$`) // (2)

func isGeneratedFile(node ast.Node) bool {
	if file, ok := node.(*ast.File); ok {
		for _, c := range file.Comments {
			if c.Pos() >= file.Package { // (3)
				return false
			}
			for _, cc := range c.List {
				if generatedCodeRe.MatchString(cc.Text) { // (4)
					return true
				}
			}
		}
	}
	return false
}

(1) For each node, check if the root element i.e. *ast.File is a generated file.
(2) A regex to match a special comment in a generated file.
(3) Check all the comments before the package keyword as the generated comments reside before package keyword.
(4) Check if the comment text is matched with the regex

We can also add a flag to the analyzer to control whether the generated file should be analyzed or not. To do that, we will use the Flags field of the Analyzer.

var Analyzer = &analysis.Analyzer{
	Name:     "varname",
	Doc:      "Check snake case variable naming",
	Run:      run,
	Flags:    flags(), // (1)
	Requires: []*analysis.Analyzer{inspect.Analyzer},
}

var analyzeGenerated *bool

func flags() flag.FlagSet {
	var fs flag.FlagSet
	analyzeGenerated = fs.Bool("analyze-generated", false, "analyze generated file") // (2)
	return fs
}

(1) Flags represents all the flags defined for the analyzer
(2) Define a flag named analyze-generated having default value false.

Now, before checking if a file is generated or not we will just add an extra check whether the flag is enabled or not.

if !*analyzeGenerated && isGeneratedFile(stack[0]) { // (3)
	return false
}

Running the analyzer

Here is all the code for the analyzer:

package varname

import (
	"flag"
	"fmt"
	"go/ast"
	"go/token"
	"regexp"
	"strings"

	"golang.org/x/tools/go/analysis"
	"golang.org/x/tools/go/analysis/passes/inspect"
	"golang.org/x/tools/go/ast/inspector"
)

var Analyzer = &analysis.Analyzer{
	Name:     "varname",
	Doc:      "Check snake case variable naming",
	Run:      run,
	Flags:    flags(),
	Requires: []*analysis.Analyzer{inspect.Analyzer},
}

var analyzeGenerated *bool

func flags() flag.FlagSet {
	var fs flag.FlagSet
	analyzeGenerated = fs.Bool("analyze-generated", false, "analyze generated file")
	return fs
}

func run(pass *analysis.Pass) (interface{}, error) {
	anInspector := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)

	nodeFilter := []ast.Node{
		(*ast.AssignStmt)(nil),
		(*ast.ValueSpec)(nil),
	}

	anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
		if !*analyzeGenerated && isGeneratedFile(stack[0]) {
			return false
		}
		switch n := n.(type) {
		case *ast.ValueSpec:
			for _, id := range n.Names {
				if isSnakeCase(id.Name) {
					pass.ReportRangef(n, "avoid snake case naming convention")
				}
			}

		case *ast.AssignStmt:
			if n.Tok == token.DEFINE {
				for _, lhsExpr := range n.Lhs {
					if id, ok := lhsExpr.(*ast.Ident); ok && isSnakeCase(id.Name) {
						pass.ReportRangef(n, "avoid snake case naming convention")
					}
				}
			}
		}

		return true
	})

	return nil, nil
}

var generatedCodeRe = regexp.MustCompile(`^// Code generated .* DO NOT EDIT\.$`)

func isGeneratedFile(node ast.Node) bool {
	if file, ok := node.(*ast.File); ok {
		for _, c := range file.Comments {
			if c.Pos() >= file.Package {
				return false
			}
			for _, cc := range c.List {
				if generatedCodeRe.MatchString(cc.Text) {
					return true
				}
			}
		}
	}
	return false
}

func isSnakeCase(s string) bool {
	return s != "_" && strings.ContainsRune(s, '_')
}

Save this code in a file in the root directory of the repository. Now, let’s create the main.go file inside cmd/varname directory and paste:

package main

import (
	"example.com/varname"
	"golang.org/x/tools/go/analysis/singlechecker"
)

func main() {
	singlechecker.Main(varname.Analyzer) // (1)
}

(1) Invoke singlechecker.Main function and pass the analyzer as an argument.

Now run go build inside cmd/varname and we will get an executable which can be run as a CLI app. For example, we can run

varname ./...

inside a go project to invoke the linter.

So, what is singlechecker.Main doing here?

It performs some validation on the Analyzer like checking if the Name field is a valid identifier.
It registers all the flags defined for the analyzer. Besides the defined flags, it also registers some default flags for the analyzer. Try running varname -h.
It loads all the packages (which are passed as an argument). During package loading, all the syntax trees are created and type information is calculated and for each package, it creates the pass object (remember func run(pass *analysis.Pass)). So, when we run varname ./… inside a go project, it loads all the packages of the project and runs the varname analyzer for each package. We can also run the analyzer on a single package like varname fmt to run the analyzer on the fmt package.

Also, if we have multiple analyzers, we can invoke all of them using multichecker.Main.

Running as a CLI

Create a file with the contents given below:

package main

var foo_bar string

var (
	num_of_var int
)

func _() int {
	sum_of_value := 0
	return sum_of_value
}

Now, run varname ./… and check the output.

/home/nayeem/my-codes/pg/foo.go:3:5: avoid snake case naming convention
/home/nayeem/my-codes/pg/foo.go:6:2: avoid snake case naming convention
/home/nayeem/my-codes/pg/foo.go:10:2: avoid snake case naming convention

Now, run the analyzer on a generated file like below:

// Code generated by "stringer -type=OpType"; DO NOT EDIT.

package main

const _OpType_name = "OpAddOpSubOpMulOpDiv"

var _OpType_index = [...]uint8{0, 5, 10, 15, 20}

Invoking varname does not give any warning on this file but we can also analyze it with the analyze-generated flag.

varname -analyze-generated .

Now, the warning will show up.

/home/nayeem/my-codes/pg/optype_string.go:17:7: avoid snake case naming convention
/home/nayeem/my-codes/pg/optype_string.go:19:5: avoid snake case naming convention

Summary

With the analysis package, we have created a simple analyzer. It is also possible to do the same thing without the analysis package.

In that case, we have to

load all the packages under a go module with the appropriate config
iterate all the ASTs in each package
maintain the traversal stack while visiting nodes
write the logic of the analyzer using ast.Inspect or ast.Visitor
create a warning generation method
create all the flags to run the analyzer as a CLI

With the analysis package we can get rid of all the boilerplate codes and focus on only the actual logic of the analyzer.

That’s all about the analyzer. Feel free to leave a comment below to provide feedback or share any thoughts.

Thanks for reading.

Nayeem Hasan

Exploring the world of softwares