Tree Shaking Like a Caveman

Published on

I recently did some refactoring for an app where Apple basically told me, "You need to let users see the app without logging in." I ended up removing most of the onboarding screens, and I'm sure I didn't do a fantastic job cleaning up unused components. Odds are I've left behind some orphaned components that are bloating the size of the app and providing no functionality. Let's fix that.

Mission: Write a script in go that helps us get rid of unused files by tree shaking.

What's tree shaking? MDN describes it:

Tree shaking is a term commonly used within a JavaScript context to describe the removal of dead code.

I guess the idea is that you shake the tree and dead branches fall to the ground. I'd like to write a script that looks at the entry point in a react native app and generates the list of all files that are eventually included. I can compare this list with what's actually present in the project directories to see what files are not actively being used, and delete those that aren't in use.

I'm going to call this doubtfire, because 1. Robin Williams titular character does some cleaning in the 90s classic Mrs. Doubtfire, 2. the movie is still good twenty-something years later, and 3. I imagine that doubtfire is a pretty uncommon name for packages.

I'm imagining that running the script looks like this:

$ go run doubtfire.go
Detecting package namespacing from package.json...
- Package named 'runner'
Recursing on App.js to map dependencies...
68 files included
Top level folders referenced: [navigation components reducers...]
Shaking those directories...
Found 6 unused files:
- navigation/Auth.js
- navigation/Signup.js
- navigation/Login.js
- components/Example.js
- components/foo/Bar.js
- reducers/Auth.js

> Do you want to remove these files?
Type "y" + enter to remove, anything else + enter to exit:
y
Deleting...
- navigation/Auth.js
- navigation/Signup.js
- navigation/Login.js
- components/Example.js
- components/foo/Bar.js
- reducers/Auth.js
Done!

Amazing! I can hardly wait to automate a boring manual process that a more patient person could do by hand in 15 minutes. ✨

How is this going to work?

Let's divide this into a few steps:

  1. Scan the package name from package.json. This will let me easily tell when I'm importing code from my own repository instead of from external packages.
  2. Create a function that accepts a file path and returns the list of imports in that file that are "ours" (reference code that is local to the project). I'll use a regex here to find the well-formatted imports (shocker).
  3. Create a function that (using the above) recursively scans our entire project for any dependencies from the entry point to build our "component inclusion tree".
  4. Create a function that makes a list of local files based on a list of directories to look in (I have to exclude node_modules somehow).
  5. Compare the difference between the local files and included files to find those no longer in use. Delete 'em!

I'll leave the signatures until later so I can properly motivate what each of them will be doing.

High level caveats

1. I'm going to assume all local imports are prefixed with the package name and do not use relative paths. Let's also insist that they're on single lines.

// GOOD
import { Something } from 'runner/components/Something'

// BAD
import { Something } from './components/Something'
import { Something } from './components/Something'

It's possible to do the relative path arithmetic (that's not a thing, yet...), but it's needlessly complex, and relative path imports don't really offer any advantages that I'm aware of over namespacing component imports live I've shown above. Similarly, feel free to improve the regex to catch more complicated imports if you've just poured yourself a fresh cup of Club-Mate.

2. I'm going to assume there's no fancy index files that bundle together a bunch of imports for convenience.

// MORE BAD
// components/index.js
import Something from 'runner/components/Something'
export { Something }

// anywhere else
import { Something } from 'runner/components'

Again, I could handle this, but I'd end up having to special case our mapping logic to say, "that file might not exist, so go see if there's a file inside a directory of that name with path .../index.js". Again, I'm leaving this as an exercise for the reader, which I'm now realizing is blog for "is too complicated to implement while maintaining reader interest", or possibly "I am lazy and out of coffee".

3. I'm going to assume your project structure looks similar to mine. I'm running this inside a react native app created with Expo. My project directory looks like this:

$ tree -L 1
.
├── App.js
├── README.md
├── __tests__
├── actions
├── app.json
├── assets
├── components
├── fastlane
├── navigation
├── package.json
├── reducers
├── screens
├── theme.js
├── utils
└── yarn.lock

It shouldn't really matter if yours has some different directories to scan, but if your entry point isn't a top level file called App.js, you're going to have to adjust your own version of the script accordingly.

Setup

I'm assuming you have go 1.x installed on your machine. If it isn't and you want to follow along, go do that. I'm using atom, and have go-plus installed. As someone who hasn't used IDEs much, this package makes working in go fun. Consider grabbing it.

Huge. With all that out of the way, let's get started with the package name.

Parsing the package name

Namespace your react native project. Stop using relative imports. Do it! (caution: LOUD).

Once you do, grabbing the name field is easy as pie. Take a look:

package main

import (
  "fmt"
  "io/ioutil"
  "encoding/json"
)

type manifest struct {
  Name string `json:"name"`
}

func main() {
  var m manifest

  fmt.Println("Detecting package namespacing from package.json...")
  jsonFile, err := ioutil.ReadFile("package.json")

  if err != nil {
    fmt.Println("Failed to open users.json", err)
    return
  }

  json.Unmarshal([]byte(jsonFile), &m)

  if (m.Name == "") {
    fmt.Println("No package name found, exiting!")
    return
  }

  fmt.Printf("- Package named '%s'\n", m.Name)
}

I only care about the name field in the package.json file, so I only need to define that in the struct I'm writing to. Everything looks good when I run it locally:

$ go run doubtfire.go
Detecting package namespacing from package.json...
- Package named 'runner'

If you don't have the field in your package.json, the script informs you of that fact and exits. How reasonable.

Creating the regular expression

I'm going to scan for lines that have import ... from 'runner/...' in order to see what packages are used locally (your package name is unlikely to match). I want to capture the rest of the path from the import, which should be the path that the file exists at (suffixed with .js). Recall that I'm not dealing with index.js files here, so nothing too fancy to guard for here.

Let's quickly make sure my regex returns matches that I expect:

package main

import (
  "fmt"
  "io/ioutil"
  "encoding/json"
  "regexp"
)

// ...

func regexTest() {
  re := regexp.MustCompile(`^import .* from [\"']runner/(.*)[\"']$`)
  one := "import { View, Text } from 'react-native'"
  two := "import rootReducer from 'runner/reducers/root'"
  fmt.Printf("1: %q\n", re.FindSubmatch([]byte(one)))
  fmt.Printf("2: %q\n", re.FindSubmatch([]byte(two)))
}

func main() {
  regexTest()

  return
  // ...
}

Does this mostly match my primitive use case?

$ go run doubtfire.go
1: []
2: ["import rootReducer from 'runner/reducers/root'" "reducers/root"]

Looks good. I've captured the file path I care about, and can use that to recurse. Full steam ahead!

Scanning one file

Let's use some real lines by testing my regex against each line of App.js to see what matches I generate. To do this, I'm writing a new function that has the following signature:

func scanFileForImports(path, pName string) ([]string, error)

I want a function that accepts a path to a file and the package name, and then attempts to open that file and scans each line for local imports. I'll have to do the .js suffixing somewhere, but I can handle that when I recurse, since I will always start by calling this on App.js.

package main

import (
  "os"
  "fmt"
  "io/ioutil"
  "bufio"
  "encoding/json"
  "regexp"
)

// ...

func scanFileForImports(path, pName string) ([]string, error) {
  var targets []string

  re := regexp.MustCompile(`^import .* from [\"']` +
    pName + `/(.*)[\"']$`)

  file, err := os.Open(path)
  defer file.Close()

  if err != nil {
    fmt.Printf("Failed opening file: %s\n", err)
    return targets, err
  }

  scanner := bufio.NewScanner(file)
  scanner.Split(bufio.ScanLines)

  for scanner.Scan() {
    line := scanner.Text()
    matches := re.FindSubmatch([]byte(line))

    if len(matches) != 0 {
      targets = append(targets, string(matches[1]))
    }
  }

  return targets, nil
}

func main() {
  // ...

  fmt.Printf("- Package named '%s'\n", m.Name)
  fmt.Println("Recursing on App.js to map dependencies...")
  e, ms := scanFileForImports("App.js", m.Name)

  if e != nil {
    return
  }

  fmt.Println("Found targets:")
  for _, eachline := range ms {
    fmt.Println(eachline)
  }

  return

  // ...

My abbreviated script happily spits out some targets to recurse on:

$ go run doubtfire.go
Found targets:
reducers/root
theme
navigation/RootNav
screens/FlashWrapper

Sweet! So now I just have to rescan those (suffixed) files recursively, and I'll have a tree structure representing our dependency map.

Recursing from App

At this point, I'd like to start thinking about how we're going to store the results we've found. I eventually need a flat array of files I've found that are unique, but I think there is other data we might want to store along the way. Here's what I'm proposing:

type component struct {
  path	string
  depth	int
  deps 	[]component
}

I'm assuming that I'll naively scan the entire dependency structure first, rather than try to only store each component once while reading in the relationships. I'll have to flatten and remove duplicates once we're done, but this way, if I wanted to do any fancy printing as I recursed, it would be straightforward.

// take a filepath and depth, and return it as a component
// with child components filled out
func recurse(path, pName string, depth int) (component, error) {
  c := component{depth: depth, path: path}
  imps, e := scanFileForImports(path, pName)

  if e != nil {
    fmt.Printf("Failed to recurse on file: %s\n", path)
    return c, e
  }

  if len(imps) != 0 {
    var cs []component
    for i := 0; i < len(imps); i++ {
      jsPath := imps[i] + ".js"
      comp, err := recurse(jsPath, pName, depth + 1)
      if err != nil {
        return c, err
      }
      cs = append(cs, comp)
    }
    c.deps = cs
  }

  return c, nil
}

I'm creating the component struct for a given filepath, calculating the imports, and then recursing on any children1 to continue mapping dependencies. In the base case, where I'm running this function on a leaf component (one that imports nothing), len(imps) is 0, so I return the component without recursing.

Flattening our component tree

At this point, I've created a verbose tree structure mapping our dependencies stemming from App.js. In order to compare these to our list of local files I'm about to generate, let's first flatten the structure and dedup anything included from multiple components.

// ...

// a tiny helper function for preserving the min depth
func min(x, y int) int {
  if x < y {
    return x
  }
  return y
}

// flatten takes a component and flattens it + deps into a unique map
// from pathname to the minimum depth
func flatten(c component) (map[string]int, error) {
  m := make(map[string]int)

  m[c.path] = c.depth

  for i := 0; i < len(c.deps); i++ {
    toAdd, e := flatten(c.deps[i])

    if e != nil {
      return m, e
    }
    for k, v := range toAdd {
      if val, ok := m[k]; ok {
        m[k] = min(val, v)
      } else {
        m[k] = v
      }
    }
  }

  return m, nil
}

// ...

func main() {
  // ...
  fmt.Printf("- Package named '%s'\n", m.Name)
  fmt.Println("Recursing on App.js to map dependencies...")

  c, e := recurse("App.js", m.Name, 0)

  if e != nil {
    fmt.Printf("Failed to recurse: %v\n", e)
    return
  }

  uniqueHash, err := flatten(c)
  fmt.Printf("%v", uniqueHash)

  return
  // ...
}

I'm using a map here (essentially a hash or dictionary) to mark each instance of a dependency only once. I'm also storing the minimum depth as the value in the map (though I have no specific use for it at this point).

Rad! If you've been following along at home, your version of the script should now do something like the following:

$ go run doubtfire.go
Detecting package namespacing from package.json...
- Package named 'runner'
Recursing on App.js to map dependencies...
map[App.js:0 actions/AlarmsActions.js:5 actions/FlashActions.js:2...

Listing local files

Now that I have my unique list of included files, I need to generate the list of all files that are local so I can compute the difference and find dead brances. But! I'm only going to pay attention to folders that contain files that are being used (which should hopefully exclude node_modules). Let's quickly scan the unique hash with another regex to generate that list:

// ...

// only care about the top level folder, which we will scan recursively
var folder = regexp.MustCompile(`^([a-zA-Z]*)/.*$`)

func containsFolder(path string) string {
  matches := folder.FindSubmatch([]byte(path))
  // succesful match returns array of [full path, capture group]
  // ["some/folder/filepath", "some"]
  if len(matches) > 0 {
    return string(matches[1])
  }
  return ""
}

// returns a list of folder prefixes from our hash
func getFolderList(m map[string]int) []string {
  var stringList []string
  structList := make(map[string]struct{})

  for k := range m {
    target := containsFolder(k)
    // if there is a directory target, try to add it
    if target != "" {
      if _, ok := structList[target]; !ok {
        structList[target] = struct{}{}
      }
    }
  }

  for k := range structList {
    stringList = append(stringList, k)
  }

  return stringList
}

// ...

Notice that I'm only storing an empty struct (rather than a bool), because presence of the key marks presence of the file for our purposes here. I've shamelessly lifted this from Stack Overflow, here's the link and quote (from "The Go Programming Language"):

The struct type with no fields is called the empty struct, written struct. It has size zero and carries no information but may be useful nonetheless. Some Go programmers use it instead of bool as the value type of a map that represents a set, to emphasize that only the keys are significant, but the space saving is marginal and the syntax more cumbersome, so we generally avoid it.

Like Mr. Powers, we also like to live dangerously. Choose your own adventure.

At this point, we'll have an array of strings containing the folders I want to list files for. So let's see what the code looks like that does that:

import (
  "path/filepath"
  // ...
)

func listFilesInFolderRecursively(folder string) ([]string, error) {
  paths := make([]string, 0)

  e := filepath.Walk(folder,
    func(path string, info os.FileInfo, err error) error {
      if err != nil {
        return err
      }

      // only return files, not directories
      if !info.IsDir() {
        paths = append(paths, path)
      }
      return nil
    })

  return paths, e
}

filepath.Walk runs the function on each descendant file, so all I have to do is check that there's no error and save each path if it's not a directory. At this point, I should have both the flat list of imports in the project, and the list of all files from those referenced folders. Let's compute the difference between the two sets and delete some dead code!

Computing the dead branches

I'm going to make a quick map of bools of all the files present in the project, defaulting them all as false. I'll then update those that are in the dependency map to be true, which means any that are still false are unused.

func generateUsageMap(folderList []string) (map[string]bool, error) {
  fileUsageMap := make(map[string]bool)

  for _, v := range folderList {
    children, err := listFilesInFolderRecursively(v)
    if err != nil {
      fmt.Printf("Failed to list files: %v \n", err)
      return fileUsageMap, err
    }

    for _, child := range children {
      fileUsageMap[child] = false
    }
  }

  return fileUsageMap, nil
}

func printUnused(fs []string) {
  fmt.Printf("Found %d unused files:\n", len(fs))

  for _, f := range fs {
    fmt.Printf("- %s\n", f)
  }
}

func main() {
  // ...

  list := getFolderList(uniqueHash)

  fmt.Printf("Top level folders referenced: %v\n", list)
  fmt.Println("Shaking those directories...")

  fileUsageMap, err := generateUsageMap(list)

  if err != nil {
    fmt.Printf("Failed to create map of possible files %v\n", err)
    return
  }

  for k := range uniqueHash {
    fileUsageMap[k] = true
  }

  unusedFiles := make([]string, 0)

  for k, v := range fileUsageMap {
    if !v {
      unusedFiles = append(unusedFiles, k)
    }
  }

  printUnused(unusedFiles)
  return
}

If I run the script now, I see the following output:

Detecting package namespacing from package.json...
- Package named 'runner'
Recursing on App.js to map dependencies...
86 files included
Top level folders referenced: [components utils reducers navigation actions screens]
Shaking those directories...
Found 14 unused files:
- components/common/LabeledInput.js
- components/LocationAndNumber.js
- components/common/TriToggle.js
- components/Expander.js
- components/common/ImageToggle.js
- components/Fade.js
- components/Label.js
- components/alarms/AlarmWell.js
- utils/ignore.js
- components/common/Placeholder.js
- components/common/IconButton.js
- components/judgements/JudgementIcons.js
- utils/emailValidation.js
- components/common/ColorTextBlurb.js

Voila. All we have left to do is to delete the files.

Safely removing files

A user might want to manually check that the files aren't in use with some grepping, so I'm going to make the user confirm they want to delete those files before we do so for them. I'm using git, but it's possible someone isn't using version control and wouldn't easily be able to recover deleted files. To avoid a nasty surprise, we'll make the user confirm the files to deletion.

func removeFiles(unused []string) error {
  fmt.Println("Deleting...")

  for i := 0; i < len(unused); i++ {
    if removeErr := os.Remove(unused[i]); removeErr != nil {
      return removeErr
    }
    fmt.Printf("- %s\n", unused[i])
  }

  fmt.Println("Done!")
  return nil
}

// ...

func main() {
  // ...
  printUnused(unusedFiles)

  if len(unusedFiles) > 0 {
    fmt.Println("\n> Do you want to remove these files?")
    fmt.Println(`Type "y" + enter to remove, anything else + enter to exit:`)

    scanner := bufio.NewScanner(os.Stdin)

    for scanner.Scan() {
      t := scanner.Text()
      if t == "y" {
        if err = removeFiles(unusedFiles); err != nil {
          fmt.Printf("Failed to remove files: %v\n", err)
        }
      } else {
        fmt.Println("Exiting...")
      }
      return
    }
  } else {
    fmt.Println("No files to remove! Exiting...")
  }
}

As long as there are unused files to delete, I prompt the user for input and conditionally handle the result. All that's left to do is run it...

Running it against my project

$ go run doubtfire
Detecting package namespacing from package.json...
- Package named 'runner'
Recursing on App.js to map dependencies...
86 files included
Top level folders referenced: [utils screens components navigation reducers actions]
Shaking those directories...
Found 14 unused files:
- components/common/Placeholder.js
- components/judgements/JudgementIcons.js
- components/alarms/AlarmWell.js
- components/common/IconButton.js
- components/Expander.js
- components/Label.js
- components/Fade.js
- components/common/ColorTextBlurb.js
- utils/emailValidation.js
- components/common/ImageToggle.js
- utils/ignore.js
- components/LocationAndNumber.js
- components/common/LabeledInput.js
- components/common/TriToggle.js

> Do you want to remove these files?
Type "y" + enter to remove, anything else + enter to exit:
y
Deleting...
- components/common/Placeholder.js
- components/judgements/JudgementIcons.js
- components/alarms/AlarmWell.js
- components/common/IconButton.js
- components/Expander.js
- components/Label.js
- components/Fade.js
- components/common/ColorTextBlurb.js
- utils/emailValidation.js
- components/common/ImageToggle.js
- utils/ignore.js
- components/LocationAndNumber.js
- components/common/LabeledInput.js
- components/common/TriToggle.js
Done!

14 / 100 files weren't in use! That's not damning for a tiny project, but in a larger project, having tools to help keep my project well maintained is increasingly important. And now I have a few hundred lines of go that help me refactor more confidently! Huzzah.

What did we learn?

  1. Tree shaking is a way of removing dead code (implemented via a simple regex and recursion)
  2. Tooling to maintain repository hygiene can be useful, even for small projects
  3. At least one of us had fun scripting in go 😊

Source code

Here's a link to the code in a single file on github - go wild.

Ideas and improvements

  • Add support for relative imports
  • Add support for multiline imports
  • Add support for index files
  • Add import loop detection (I think your device emulator would probably choke on this if you tried to run it, but could be a fun experiment)
  • Detect included but unused imports in .js files
  • Change the script to not be interactive (add a --force flag to remove unused files immediately - take a look at kong or something similar)

A non-interactive script might be more useful in the long run, because you could add it as part of a build process without having to feed in the expected "y" during execution.

Closing thoughts

Am I writing particularly idiomatic go? Unlikely.

Would this have been quicker or easier in ruby or python? Probably.

Was it fun anyway? Absolutely.

Let me know if you have any thoughts (or corrections) on email or Twitter.