Nayeem Hasan

Exploring the world of softwares

Go: Build your own linter

Posted at — Jan 17, 2024

Go provides rich support for lexical analysis, parsing and type checking of a go package. Using these tools, we can create our own linter to detect any issue or perform some refactoring.

To make matters easier, go tools module provides analysis package with which we can create linter or static analysis without manual parsing or loading packages. The analysis package provides a nice API to write the business logic of our linter or static analysis and test them effectively.

A simple linter

Let’s build a simple linter that will detect a violation of the naming convention in go. As per documentation, The convention in Go is to use MixedCaps or mixedCaps rather than underscores to write multiword names. So, this linter will catch usage of underscore in variable naming like mixed_caps.

Before diving into the code, we need to find out for which cases we will issue the warning. We need to issue the warning where a variable is being declared. Now, a variable declaration may look like this:

func _() {
    var (
        x = 10  // (1)
    )
    var y int   // (2)
    z := 10     // (3)
}

According to go/ast, the variable declaration in (1) and (2) is a DeclStmt and the short variable declaration in (3) is an AssignStmt

What is a DeclStmt?

// A DeclStmt node represents a declaration in a statement list.
DeclStmt struct {
    Decl Decl // *GenDecl with CONST, TYPE, or VAR token
}

// A GenDecl node (generic declaration node) represents an import,
// constant, type or variable declaration. A valid Lparen position
// (Lparen.IsValid()) indicates a parenthesized declaration.
GenDecl struct {
    Doc    *CommentGroup // associated documentation; or nil
    TokPos token.Pos     // position of Tok
    Tok    token.Token   // IMPORT, CONST, TYPE, or VAR
    Lparen token.Pos     // position of '(', if any
    Specs  []Spec
    Rparen token.Pos // position of ')', if any
}

A DeclStmt represents a declaration and the field Decl is of Decl type which is an interface. It is implemented by *GenDecl and the GenDecl type contains Spec which is also an interface. The Decl interface is also implemented by other types like FuncDecl (represents a function declaration) which is not relevant to this case.

// A ValueSpec node represents a constant or variable declaration
// (ConstSpec or VarSpec production).
ValueSpec struct {
    Doc     *CommentGroup // associated documentation; or nil
    Names   []*Ident      // value names (len(Names) > 0)
    Type    Expr          // value type; or nil
    Values  []Expr        // initial values; or nil
    Comment *CommentGroup // line comments; or nil
}

The interface Spec is implemented by *ValueSpec. A ValueSpec node represents a const or variable declaration and the Names field contains all the identifiers in a declaration which is of *ast.Ident type. So, for this linter, it is enough to check the identifiers in a ValueSpec node.

Now, let’s check what is an AssignStmt.

// An AssignStmt node represents an assignment or
// a short variable declaration.
type AssignStmt struct {
	Lhs    []Expr
	TokPos token.Pos   // position of Tok
	Tok    token.Token // assignment token, DEFINE
	Rhs    []Expr
}

An AssignStmt represents statement like x = 1 or x := 1. For this linter, we are interested in x := 1 and check AssignStmt with Tok having value of token.DEFINE i.e. :=. Here, DEFINE is a constant of type Token declared inside go/token package. Also, the Lhs fields is a slice of Expr which is an interface and it is implemented by many expression types. We are only interested where the Expr is an identifier.

So, to summarize we will check

The code

First, we will declare a variable of type Analyzer. According to the documentation of Analyzer,

An Analyzer statically describes an analysis function: its name, documentation, flags, relationship to other analyzers, and of course, its logic.

So, the Analyzer contains some metadata of the linter and has a function that will contain the business logic.

var Analyzer = &analysis.Analyzer{
	Name:     "varname",                              // (1)
	Doc:      "Check snake case variable naming",     // (2)
	Run:      run,                                    // (3)
	Requires: []*analysis.Analyzer{inspect.Analyzer}, // (4)
}

func run(pass *analysis.Pass) (interface{}, error) { // (5)
    // logic of the linter
}

Now, let’s write the actual logic of the analyzer.

func run(pass *analysis.Pass) (interface{}, error) { // (1)
	anInspector := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector) // (2)

	nodeFilter := []ast.Node{ // (3)
		(*ast.AssignStmt)(nil),
		(*ast.ValueSpec)(nil),
	}
	anInspector.Preorder(nodeFilter, func(n ast.Node) { //(4)
		switch n := n.(type) {
		case *ast.ValueSpec: // (5)
			for _, id := range n.Names {
				if isSnakeCase(id.Name) {
					pass.ReportRangef(n, "avoid snake case naming convention")
				}
			}

		case *ast.AssignStmt: // (6)
			if n.Tok == token.DEFINE {
				for _, lhsExpr := range n.Lhs {
					if id, ok := lhsExpr.(*ast.Ident); ok && isSnakeCase(id.Name) {
						pass.ReportRangef(n, "avoid snake case naming convention")
					}
				}
			}
		}
	})

	return nil, nil
}

func isSnakeCase(s string) bool {
	return s != "_" && strings.ContainsRune(s, '_') // (7)
}

Adding more functionality

Now, let’s make some improvements to prevent some unwanted warnings. First, we want to prevent the analyzer from running on an auto-generated file as the auto-generated file often contains variables with snake case and we do not want to modify it.

A generated file contains comment like this

// Code generated by "stringer -type=SomeType"; DO NOT EDIT.

package foo

So, to check if a file is generated or not, we need to check the comments in the file. Comments in a go file are kept under the root element of the AST which is *ast.File. From the doc,

type File struct {
	...
	Comments           []*CommentGroup // list of all comments in the source file
	GoVersion          string          // minimum Go version required by //go:build or // +build directives
}

So, we need to check the Comments field in the *ast.File. But how can we get the *ast.File? If we check the function in the Preorder method, it just provides an ast.Node and there is no parent or ancestor information associated with it to find the root element which is the *ast.File node.

anInspector.Preorder(nodeFilter, func(n ast.Node) {
	// ...
})

Fortunately, the *inspector.Inspector provides another method WithStack which contains the current traversal stack in the parameter stack. The first element of the stack is an *ast.File node.

anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
	// ...
})

let’s modify the analyzer with the WithStack method.

func run(pass *analysis.Pass) (interface{}, error) {
	
	...

	anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
		if isGeneratedFile(stack[0]) { // (1)
			return false
		}
		switch n := n.(type) {
			...
		}

		return true
	})

	return nil, nil
}

var generatedCodeRe = regexp.MustCompile(`^// Code generated .* DO NOT EDIT\.$`) // (2)

func isGeneratedFile(node ast.Node) bool {
	if file, ok := node.(*ast.File); ok {
		for _, c := range file.Comments {
			if c.Pos() >= file.Package { // (3)
				return false
			}
			for _, cc := range c.List {
				if generatedCodeRe.MatchString(cc.Text) { // (4)
					return true
				}
			}
		}
	}
	return false
}

We can also add a flag to the analyzer to control whether the generated file should be analyzed or not. To do that, we will use the Flags field of the Analyzer.

var Analyzer = &analysis.Analyzer{
	Name:     "varname",
	Doc:      "Check snake case variable naming",
	Run:      run,
	Flags:    flags(), // (1)
	Requires: []*analysis.Analyzer{inspect.Analyzer},
}

var analyzeGenerated *bool

func flags() flag.FlagSet {
	var fs flag.FlagSet
	analyzeGenerated = fs.Bool("analyze-generated", false, "analyze generated file") // (2)
	return fs
}

Now, before checking if a file is generated or not we will just add an extra check whether the flag is enabled or not.

if !*analyzeGenerated && isGeneratedFile(stack[0]) { // (3)
	return false
}

Running the analyzer

Here is all the code for the analyzer:

package varname

import (
	"flag"
	"fmt"
	"go/ast"
	"go/token"
	"regexp"
	"strings"

	"golang.org/x/tools/go/analysis"
	"golang.org/x/tools/go/analysis/passes/inspect"
	"golang.org/x/tools/go/ast/inspector"
)

var Analyzer = &analysis.Analyzer{
	Name:     "varname",
	Doc:      "Check snake case variable naming",
	Run:      run,
	Flags:    flags(),
	Requires: []*analysis.Analyzer{inspect.Analyzer},
}

var analyzeGenerated *bool

func flags() flag.FlagSet {
	var fs flag.FlagSet
	analyzeGenerated = fs.Bool("analyze-generated", false, "analyze generated file")
	return fs
}

func run(pass *analysis.Pass) (interface{}, error) {
	anInspector := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)

	nodeFilter := []ast.Node{
		(*ast.AssignStmt)(nil),
		(*ast.ValueSpec)(nil),
	}

	anInspector.WithStack(nodeFilter, func(n ast.Node, push bool, stack []ast.Node) (proceed bool) {
		if !*analyzeGenerated && isGeneratedFile(stack[0]) {
			return false
		}
		switch n := n.(type) {
		case *ast.ValueSpec:
			for _, id := range n.Names {
				if isSnakeCase(id.Name) {
					pass.ReportRangef(n, "avoid snake case naming convention")
				}
			}

		case *ast.AssignStmt:
			if n.Tok == token.DEFINE {
				for _, lhsExpr := range n.Lhs {
					if id, ok := lhsExpr.(*ast.Ident); ok && isSnakeCase(id.Name) {
						pass.ReportRangef(n, "avoid snake case naming convention")
					}
				}
			}
		}

		return true
	})

	return nil, nil
}

var generatedCodeRe = regexp.MustCompile(`^// Code generated .* DO NOT EDIT\.$`)

func isGeneratedFile(node ast.Node) bool {
	if file, ok := node.(*ast.File); ok {
		for _, c := range file.Comments {
			if c.Pos() >= file.Package {
				return false
			}
			for _, cc := range c.List {
				if generatedCodeRe.MatchString(cc.Text) {
					return true
				}
			}
		}
	}
	return false
}

func isSnakeCase(s string) bool {
	return s != "_" && strings.ContainsRune(s, '_')
}

Save this code in a file in the root directory of the repository. Now, let’s create the main.go file inside cmd/varname directory and paste:

package main

import (
	"example.com/varname"
	"golang.org/x/tools/go/analysis/singlechecker"
)

func main() {
	singlechecker.Main(varname.Analyzer) // (1)
}

Now run go build inside cmd/varname and we will get an executable which can be run as a CLI app. For example, we can run

varname ./...

inside a go project to invoke the linter.

So, what is singlechecker.Main doing here?

Also, if we have multiple analyzers, we can invoke all of them using multichecker.Main.

Running as a CLI

Create a file with the contents given below:

package main

var foo_bar string

var (
	num_of_var int
)

func _() int {
	sum_of_value := 0
	return sum_of_value
}

Now, run varname ./… and check the output.

/home/nayeem/my-codes/pg/foo.go:3:5: avoid snake case naming convention
/home/nayeem/my-codes/pg/foo.go:6:2: avoid snake case naming convention
/home/nayeem/my-codes/pg/foo.go:10:2: avoid snake case naming convention

Now, run the analyzer on a generated file like below:

// Code generated by "stringer -type=OpType"; DO NOT EDIT.

package main

const _OpType_name = "OpAddOpSubOpMulOpDiv"

var _OpType_index = [...]uint8{0, 5, 10, 15, 20}

Invoking varname does not give any warning on this file but we can also analyze it with the analyze-generated flag.

varname -analyze-generated .

Now, the warning will show up.

/home/nayeem/my-codes/pg/optype_string.go:17:7: avoid snake case naming convention
/home/nayeem/my-codes/pg/optype_string.go:19:5: avoid snake case naming convention

Summary

With the analysis package, we have created a simple analyzer. It is also possible to do the same thing without the analysis package.

In that case, we have to

With the analysis package we can get rid of all the boilerplate codes and focus on only the actual logic of the analyzer.

That’s all about the analyzer. Feel free to leave a comment below to provide feedback or share any thoughts.

Thanks for reading.