Preventing Salesforce Regression: Introducing the Apex Linter

TL;DR: We’re sharing some of our Salesforce learnings in the form of a linter (code checker). The linter flags things that have caused problems for us in Apex, the programming language used in Salesforce. It flags a tricky problem with Map and Set, the use of futures and SeeAllData in unit tests, and the obsolete ‚testMethod‚ keyword. Because Map and Set are used a lot, that check can be suppressed on a line-by-line basis by putting a comment with a link on back to a document about it in the GitHub open source project. Now, back to our story…

While investigating a fortunately rare bug that caused some headache for the business and had eluded several developers, my investigation led to code that was using Salesforce’s hash table, simply called Map. As I pored over documentation, one line on this page stopped me dead in my tracks:

Uniqueness of map keys of user-defined types is determined by the equals and hashCode methods, which you provide in your classes. Uniqueness of keys of all other non-primitive types, such as sObject keys, is determined by comparing the objects’ field values.

As it turns out, sObject hash code computation also uses field values, just like its implementation of equals. Two sObjects of the same type with the same field values that have not been inserted into the database have the same hash code and are equal.

Recall that hash table and set work on similar principles: the object’s hash code is used to locate a bucket, and then equals is used to match a key to a value within the bucket, since many keys may hash to the same bucket index in the underlying array.

This means that sObjects that have the same field values and have not been inserted into the database (id == null) look the same to Maps and Sets. Furthermore, changing the field values of an sObject will change its hash code.

If the sObject happens to be a Map key, its value will almost certainly become inaccessible because the change in hash code will direct the lookup to a different bucket. If it’s a Set member, contains will likely return false fo the same reason.

Here’s an example that you can run as Anonymous Apex:

Most programmers are accustomed to Java’s default implementations for hashCode and, more importantly, equals, both of which effectively use memory locations and therefore proxy object identity. Although it’s certainly possible to implement hashCode and equals to use field values, like sObject, it’s not expected.

The upshot of all this is that one has to be careful about Map keys and Set members that are non-base types. This is true for Java, but it’s especially true for Apex because of the special case of sObjects.

This led to the first production Apex Linter, which was a simple Bash script that used regular expressions to flag Maps using non-base types as keys. We created a Jenkins project for it and ran it using a GitHub Pull Request Builder. Of course we had to clean up a lot of code, but in most cases we found that the Map wasn’t needed at all, and that a List would could often replace a Set.

In cases where the Map or Set is used within one method, especially in tests, we wanted an easy way to override the linter’s warning on a line-by-line basis. You can do this by adding a link to the documentation in a comment on the same line:

https://github.com/quantcast/apexlint/blob/master/MAPS-AND-SETS.md

Often, we add „known risk“ to the comment because everything is sweetened by risk. Except software development.

My colleague Simon Law got excited about making it even better, and ported my two scripts into a single Python module to make it easier to add new validators and test them. It even has color syntax highlighting! This is the version we’ve released as an open-source project:

https://github.com/quantcast/apexlint

It flags the following cases:

  • Object types as Map keys or Set members
    • sObject hashes are computed from field values. Changes to field values of keys make Map values inaccessible. For Set, it causes Set.contains() to return false.
    • Object hashes default to memory locations
    • These differences produce subtle bugs, so we recommend against the practice. However, we do allow it with a link to the issue documentation.
  • SeeAllData in tests
    • Creates situations in which test execution can conflict with production data.
    • Encourages DML in tests, which slows them down.
    • Production systems that lock rows used by tests can cause them to fail, causing deployments to fail randomly. Conversely, tests that lock rows used by production systems can cause production processes to fail.
    • SeeAllData=false doesn’t do anything in classes annotated with SeeAllData=true!
  • No @future in test classes
    • Futures are scheduled in a small finite queue which is shared across the org. As with SeeAllData, the contention can lead to bad behavior. It also makes your tests arbitrarily slow, leading to longer deployment times.
    • If „Disable Parallel Test Execution“ is off, this queue can get full very quickly.
    • We recommend that you use @testSetup instead of @future to avoid „mixed DML“ issues, and use Test.startTest() and Test.stopTest() to avoid „Too Many SOQL Queries.“
  • A check for the obsolete testMethod keyword
    • Use the @isTest annotation instead.

Mercifully, we only found one bug caused by improper keys with Map in our code, and in general we’ve found that most uses of Map weren’t even necessary.

Although it was a lot of work to remove SeeAllData and @future from our tests, these changes cut our deployment time in half! We went from only being able to schedule only one deployment per day to as many as five!

We hope you’ll benefit from these suggestions.