Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Maurits Thinks Aloud - Maurits Rijk
Syndicate content Maurits thinks aloud
Stuff I care about
Updated: 1 day 7 hours ago

Refactoring Java using Clojure with the Eclipse Java development tools (JDT)

Sun, 03/10/2013 - 15:17

Not a very catchy title this time, since this post will be mostly about hardcore nerdy coding. In my previous post I talked about the business value of elegant code. I argued that cleaning up an existing codebase in the maintenance phase still makes a lot of sense, but only if it can be done cheaply. One of the ways to make it cheap (cost-effective might be a better word if you need to sell this to your management) is of course to automate the refactoring process.

The problem

By automating I mean automating detection of code smell and automating fixing this smell. This in contrast to tools that are very good at detecting but don’t fix anything. For example Sonar. Also in contrast to most IDE’s that allow you to select a piece of code and select for example the ‘Extract method‘ action from the menu. That certainly is helpful, but your IDE will probably not detect if that is needed. It will just execute what you tell it to do.

Let me first show you an example of some Java code that I would like to refactor automatically:

public class A {
   int answer() {
      return (42);
   }
}

I case you already haven’t noticed: in the code above the return statement has an extra pair of parenthesis. You can find many discussions on the internet on why this is or isn’t a good thing, but personally I don’t like them for the simple reason that return is a keyword and not a function. Extra parenthesis just add visual noise. So what I would like to see instead is:

public class A {
   int answer() {
      return 42;
   }
}

This simple refactoring is already surprisingly difficult if you want to do this with a set of regular expressions since you need the context of the return statement. For example you have to be sure it’s not part of a comment or a string. The alternative is to fully parse the code and create an abstract syntax tree (AST). Again you can create one yourself using for example ANTLR and a grammar for the language of your choice. I decided to use the Eclipse Java development tools in combination with Clojure. The scope of my experiment: being able to refactor above example.

Preparation

I got the idea for this blogpost from ‘A complete standalone example of ASTParser‘. This post lists all the Eclipse libraries you need. You will have to add these libraries (8) to your own local Maven repository. Assuming you are using Leiningen 2, I followed these steps described by Stuart Sierra. For example to add an artifact to my local Maven repository called maven_repository within my Leiningen project, I used:

mvn deploy:deploy-file -DgroupId=org.eclipse -DartifactId=text -Dversion=3.5.200 \
-Dpackaging=jar -Dfile=/Users/maurits/development/eclipse/plugins/org.eclipse.text_3.5.200.v20120523-1310.jar \
-Durl=file:maven_repository

I added all the dependencies do my project file. It looks like this:

(defproject ast "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.5.0"]
                 [org.eclipse.core/contenttype "3.4.200"]
                 [org.eclipse.core/jobs "3.5.300"]
                 [org.eclipse.core/resources "3.8.1"]
                 [org.eclipse.core/runtime "3.8.0"]
                 [org.eclipse.equinox/common "3.6.100"]
                 [org.eclipse.equinox/preferences "3.5.0"]
                 [org.eclipse.jdt/core "3.8.2"]
                 [org.eclipse/osgi "3.8.1"]
                 [org.eclipse/text "3.5.200"]]
  :repositories {"local" ~(str (.toURI (java.io.File. "maven_repository")))})

Now that the preparations are done we can finally start to write the actual refactoring code.

Refactor it!

First the code to create the AST from the Java code and the actual example I am going to use:

(ns ast.core
  (:import (org.eclipse.jdt.core.dom ASTParser AST ASTNode)))

(defn create-ast
  "Create AST from a string"
  [s]
  (let [parser (ASTParser/newParser(AST/JLS3))]
    (.setSource parser (.toCharArray s))
    (.createAST parser nil)
    ))

(def example
     (create-ast (str
                  "public class A {"
                  "   int foo() {"
                  "      return (42);"
                  "   }"
                  ""
                  "   int bar() {"
                  "      return 13;"
                  "   }"
                  "}")))

As you will notice creating an AST with the Eclipse JDT only takes a very few lines of code: on line 7 I create a JLS3 (Java Language Specification 3) parser, Line 8 tells the parser where it will get its source (in this case a string) and line 9 creates the AST. Next I need some helper functions:

(defn parenthesized-expression? [expr]
  (= (.getNodeType expr) ASTNode/PARENTHESIZED_EXPRESSION))

(defn return-statement? [stmt]
  (= (.getNodeType stmt) ASTNode/RETURN_STATEMENT))

(defn parenthesized-return-statement? [stmt]
  (and (return-statement? stmt)
       (parenthesized-expression? (.getExpression stmt))))

(defn method-declaration? [body]
  (= (.getNodeType body) ASTNode/METHOD_DECLARATION))

Details about for example ASTNode can be found in the Eclipse JDT API Specification

Next the are 4 functions that zoom in into the code we would like to refactor. You will notice that I use doseq quite a lot since the actual AST manipulation (the refactoring) will be in-place and thus has side effects. This is not always avoidable when using Java libraries from within your Clojure code. We could write an immutable version that leaves the original AST intact by returning copies of the AST though. Such functionality is supported by the Eclipse JDT.

(defn refactor-block [block]
  (doseq [stmt (filter parenthesized-return-statement? (.statements block))]
    (refactor-return stmt)))

(defn refactor-method [method]
  (refactor-block (.getBody method)))

(defn refactor-type [type]
  (doseq [method (filter method-declaration? (.bodyDeclarations type))]
    (refactor-method method)))

(defn refactor [ast]
  (doseq [type (.types ast)]
    (refactor-type type)))

As you can see we filter out the return statements that need refactoring using the parenthesized-return-statement? predicate. Only thing left is to do the actual refactoring:

(defn refactor-return [stmt]
  (let [exp (.getExpression (.getExpression stmt))
        ast (.getAST exp)
        node (ASTNode/copySubtree ast exp)]
    (.setExpression stmt node)))

In this code we first get to the expression within the parentheses, hence the double .getExpression. Note: this code only strips one level of parentheses. Next we make a copy of the expression and finally we assign it back to our return statement, effectively removing the outer parentheses.

This code is easy to test via the REPL. You will see something similar to:

ast.core=> example
#<CompilationUnit public class A {
  int foo(){
    return (42);
  }
  int bar(){
    return 13;
  }
}
>
ast.core=> (refactor example)
nil
ast.core=> example
#<CompilationUnit public class A {
  int foo(){
    return 42;
  }
  int bar(){
    return 13;
  }
}
>
ast.core=>

Finally the complete code:

(ns ast.core
  (:import (org.eclipse.jdt.core.dom ASTParser AST ASTNode)))

(defn create-ast
  "Create AST from a string"
  [s]
  (let [parser (ASTParser/newParser(AST/JLS3))]
    (.setSource parser (.toCharArray s))
    (.createAST parser nil)
    ))

(def example
     (create-ast (str
                  "public class A {"
                  "   int foo() {"
                  "      return (42);"
                  "   }"
                  ""
                  "   int bar() {"
                  "      return 13;"
                  "   }"
                  "}")))

(defn parenthesized-expression? [expr]
  (= (.getNodeType expr) ASTNode/PARENTHESIZED_EXPRESSION))
  
(defn return-statement? [stmt]
  (= (.getNodeType stmt) ASTNode/RETURN_STATEMENT))

(defn parenthesized-return-statement? [stmt]
  (and (return-statement? stmt)
       (parenthesized-expression? (.getExpression stmt))))

(defn method-declaration? [body]
  (= (.getNodeType body) ASTNode/METHOD_DECLARATION))

(defn refactor-return [stmt]
  (let [exp (.getExpression (.getExpression stmt))
        ast (.getAST exp)
        node (ASTNode/copySubtree ast exp)]
    (.setExpression stmt node)))

(defn refactor-block [block]
  (doseq [stmt (filter parenthesized-return-statement? (.statements block))]
    (refactor-return stmt)))

(defn refactor-method [method]
  (refactor-block (.getBody method)))

(defn refactor-type [type]
  (doseq [method (filter method-declaration? (.bodyDeclarations type))]
    (refactor-method method)))

(defn refactor [ast]
  (doseq [type (.types ast)]
    (refactor-type type)))

As always, don’t hesitate to leave comments or email if you have questions/remarks/suggestions.

Have fun!


Is there business value in elegant code?

Mon, 02/18/2013 - 22:38

Recently I run into the following Java code (actual variable names replaced by x and y to anonymize):

if (x != 0) {
  y = 4 - x;
} else {
  y = 4;
}

So what’s the problem with this code? There is the use of the magic number 4, maybe we could test the x for equality to zero and swap the if and the else condition. And of course the whole test is nonsense and this 5 line code snippet could be just as well be replaced by:

  y = 4 - x;

But again, is there a problem with the original code? The code works as expected. The compiler will probably optimize this code anyhow and get rid of that test statement. The unit test and functional tests passed. So at least from a business (or end user if you will) point of view no problem at all.

And of course there are costs involved with refactoring the above code. If this happens during active development costs are probably quite low and the benefits will outweigh those costs. But if this is a maintenance project the situation might be different. The actual code is already in production and might be mission critical for the company. And if there is no (fully automated) continuous delivery in place the cost of this seemingly small code improvement can be one or two orders of magnitude higher than during development.

24093_old_measuring_instrument_for_navigation_isolated

Now lets try to make a business case for the above code change in a maintenance project. If you did an MBA you learned that business case = benefits – costs. An MBA is hardly more than that. Usually this formula results in overly optimistic hockeystick curves with an ROI of less than a year. But I digress. So we have to do two things here: we have to minimize the costs and maximize the benefits.

To start with the benefits. You can make a case that refactoring your code will usually result in less code and certainly in higher quality. If the code becomes more pleasant to work with it will also result in more motivated and productive developers instead of the attitude “I am not going to touch this mess that John left behind unless I really have to.” In general: cleaning up your code base and reducing it to half its size will leave you with about 1/4 of the maintenance costs. That might sound ambitious but over the last 20 years I have seen a lot of non-trivial pieces of software (ranging from 10k – 500k LOCs) for which this was no problem at all.

On the costs side of the equation you need a couple of things to minimize these costs:

  • Continuous delivery: automated building, unit testing, functional testing, integration testing, performance testing, version control, automated deployment, etc. Push hard to get these into place in your organisation. If you really can’t have continuous delivery on short notice, at least make sure that you have tests covering the code that you are going to change. Avoid the temptation to do ‘easy changes’ without them.
  • Any modern IDE will help for trivial refactoring. I call them trivial, because most if not all IDE’s only support refactoring on the syntax level. You can extract methods, rename variables, etc. But the IDE has no clue of what the code actually means. So refactoring on a semantic or design level is mostly not supported.
  • Automate your refactoring! This seems to contradict the previous point but there are tools available (usually as plug-ins for Java IDE’s) that can take refactoring one step further by looking at the AST (Abstract Syntax Tree) of your code and recognise patterns. I wouldn’t be surprised if they could detect and fix the example I started with.

In a next blogpost I will go into detail on what levels of refactoring there exist and how they could (at least theoretically) be automated. Until then:

Have fun refactoring your legacy code base!

(You might be wondering why I inserted the image of an old instrument in this post. My point is that in the past design of many things went far beyond pure functionality. For example scientific devices were often real pieces of art. I like code to be more than ‘just functional’ and have a certain elegancy about it.)