Ex Manga Downloadr - Part 4: Learning through Refactoring

2015 December 03, 17:40 h - tags: beginner exmangadownloadr english learning elixir

Yesterday I added Mangafox support to my downloader tool and it also added a bit of dirty code into my already not-so-good coding. It's time for some serious cleanup.

You can see everything I did since yesterday to clean things up through Github's awesome compare page

First things first: now the choice to have added a reasonable amount of tests will pay off. In this refactoring I changed function signatures, response formats, moved a fair amount of code around, and without the tests this endeavor would have taken me the entire day or more, rendering the refactor efforts questionable to begin with.

At each step of the refactoring I could run "mix test" and work until I ended up with the green status:

1
2
Finished in 13.5 seconds (0.1s on load, 13.4s on tests)
12 tests, 0 failures

The tests are taking long because I made a choice for the MangaReader and Mangafox unit tests to actually go online and fetch from the sites. It takes longer to run the suite but I know that if it breaks and I didn't touch that code, the source websites changed their formats and I need to change the parser. I could have added fixtures to make the tests run faster, but the point in my parser is for them to be correct.

Macros to the Rescue!

Each source module has 3 sub-modules: ChapterPage, IndexPage and Page. All of them have a main function that resembles this piece of code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
defmodule ExMangaDownloadr.Mangafox.ChapterPage do
  require Logger
  ...
  def pages(chapter_link) do
    Logger.debug("Fetching pages from chapter #{chapter_link}")
    case HTTPotion.get(chapter_link, [headers: ["User-Agent": @user_agent, "Accept-encoding": "gzip"], timeout: 30_000]) do
      %HTTPotion.Response{ body: body, headers: headers, status_code: 200 } ->
        body = ExMangaDownloadr.Mangafox.gunzip(body, headers)
        { :ok, fetch_pages(chapter_link, body) }
      _ ->
        { :err, "not found"}
    end
  end
  ...
end

(Listing 1.1)

It calls "HTTPotion.get/2" sending a bunch of HTTP options and receives a "%HTTPotion.Response" struct that is then decomposed to get the body and headers. It gunzips the body if necessary and goes to parse the HTML itself.

Similar code exists in 6 different modules, with different links and different parser functions. It's a lot of repetition, but what about making the above code look like the snippet below?

1
2
3
4
5
6
7
8
9
defmodule ExMangaDownloadr.Mangafox.ChapterPage do
  require Logger
  require ExMangaDownloadr

  def pages(chapter_link) do
    ExMangaDownloadr.fetch chapter_link, do: fetch_pages(chapter_link)
  end
  ...
end

(Listing 1.2)

Changed 9 lines to just 1. And by the way, this same line can be written like this:

1
2
3
ExMangaDownloadr.fetch chapter_link do
  fetch_pages(chapter_link)
end

Seems familiar? It's like every block in the Elixir language, you can write it in the "do/end" block format or the way it really is under the covers: a keyword list with a key named ":do". And the way this macro is defined is like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
defmodule ExMangaDownloadr do
  ...
  defmacro fetch(link, do: expression) do
    quote do
      Logger.debug("Fetching from #{unquote(link)}")
      case HTTPotion.get(unquote(link), ExMangaDownloadr.http_headers) do
        %HTTPotion.Response{ body: body, headers: headers, status_code: 200 } ->
          { :ok, body |> ExMangaDownloadr.gunzip(headers) |> unquote(expression) }
        _ ->
          { :err, "not found"}
      end
    end
  end
  ...
end

(Listing 1.3)

There is a lot of details to consider when writing a macro and I recommend reading the documentation on Macros. The code is basically copying the function body from the "ChapterPage.pages/1" (Listing 1.1) and pasting into the "quote do .. end" block (Listing 1.3).

Inside that code we have "unquote(link)" and "unquote(expression)". You also must read the documentation on "Quote and Unquote". It just expands this "external" code inside the macro code to defer execution until the macro quote code is actually executed, instead of running it at that exact moment. I know, tricky to wrap your head around this the first time.

The bottom line is: whatever code is inside the "quote" block will be "inserted" where we called "ExMangaDownloadr.fetch/2" in the "pages/1" function in Listing 1.2, together with the unquoted code you passed as a parameter.

The resulting code will resemble the original code in Listing 1.1.

To make it simpler, if you were in Javascript this would be a similar code:

1
2
3
4
5
6
function fetch(url) {
    eval("doSomething('" + url + "')");
}
function pages(page_link) {
    fetch(page_link);
}

"Quote" would be like the string body in an eval and "unquote" just concatenating the value you passed inside the code being eval-ed. This is a crude metaphor as "quote/unquote" is way more powerful and cleaner than ugly "eval" (you shouldn't be using, by the way!) But this metaphor should do to make you understand the code above.

Another place I used a macro was to save the images list in a dump file and load it later if the tool crashes for some reason, in order not to have to start over from scratch. The original code was like this:

1
2
3
4
5
6
7
8
9
10
11
dump_file = "#{directory}/images_list.dump"
images_list = if File.exists?(dump_file) do
                :erlang.binary_to_term(File.read!(dump_file))
              else
                list = [url, source]
                  |> Workflow.chapters
                  |> Workflow.pages
                  |> Workflow.images_sources
                File.write(dump_file, :erlang.term_to_binary(list))
                list
              end

(Listing 1.4)

And now that you understand macros, you will understand what I did here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
defmodule ExMangaDownloadr do
  ...
  defmacro managed_dump(directory, do: expression) do
    quote do
      dump_file = "#{unquote(directory)}/images_list.dump"
      images_list = if File.exists?(dump_file) do
          :erlang.binary_to_term(File.read!(dump_file))
        else
          list = unquote(expression)
          File.write(dump_file, :erlang.term_to_binary(list))
          list
        end
    end
  end
  ...
end

defmodule ExMangaDownloadr.CLI do
  alias ExMangaDownloadr.Workflow
  require ExMangaDownloadr
  ...
  defp process(manga_name, directory, {_url, _source} = manga_site) do
    File.mkdir_p!(directory)

    images_list = 
      ExMangaDownloadr.managed_dump directory do
        manga_site
          |> Workflow.chapters
          |> Workflow.pages
          |> Workflow.images_sources 
      end
    ...
  end
  ...
end

And there you have it! And now you see how "do .. end" blocks are implemented. It just passes the expression as the value in the keyword list of the macro definition. Let's define a dumb macro:

1
2
3
4
5
6
7
defmodule Foo
  defmacro foo(do: expression) do
     quote do
       unquote(expression)
     end
  end
end

And not the following calls are all equivalent:

1
2
3
4
5
6
7
8
require Foo
Foo.foo do
  IO.puts(1)
end
Foo.foo do: IO.puts(1)
Foo.foo(do: IO.puts(1))
Foo.foo([do: IO.puts(1)])
Foo.foo([{:do, IO.puts(1)}])

This is macros combined with Keyword Lists which I explained in previous articles and it's simply a List with tuples where each tuple has an atom key and a value.

More Macros

Another opportunity to refactor were the "mangareader.ex" and "mangafox.ex" modules that were just used in the unit tests "mangareader_test.ex" and "mangafox_test.ex". This is the old "mangareader.ex" code:

1
2
3
4
5
6
7
8
9
defmodule ExMangaDownloadr.MangaReader do
  defmacro __using__(_opts) do
    quote do
      alias ExMangaDownloadr.MangaReader.IndexPage
      alias ExMangaDownloadr.MangaReader.ChapterPage
      alias ExMangaDownloadr.MangaReader.Page
    end
  end
end 

And this is how it was used in "mangareader_test.ex":

1
2
3
4
defmodule ExMangaDownloadr.MangaReaderTest do
  use ExUnit.Case
  use ExMangaDownloadr.MangaReader
  ...

It was just a shortcut to alias the modules in order to use them directly inside the tests. I just moved the entire module as a macro in "ex_manga_downloadr.ex" module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  ...
  def mangareader do
    quote do
      alias ExMangaDownloadr.MangaReader.IndexPage
      alias ExMangaDownloadr.MangaReader.ChapterPage
      alias ExMangaDownloadr.MangaReader.Page
    end
  end

  def mangafox do
    ...
  end

  defmacro __using__(which) when is_atom(which) do
    apply(__MODULE__, which, [])
  end
end

And now I can use it like this in the test file:

1
2
3
4
defmodule ExMangaDownloadr.MangaReaderTest do
  use ExUnit.Case
  use ExMangaDownloadr, :mangareader
  ...

The special "using" macro is called when I "use" a module, and I can even pass arguments to it. The implementation then uses "apply/3" to dynamically call the correct macro. This exactly how Phoenix imports the proper behaviors for Models, Views, Controllers, Router, for example:

1
2
3
defmodule Pxblog.PageController do
  use Pxblog.Web, :controller
  ...

The macros are open in a Phoenix file and available in the "web/web.ex" module, so I just copied the same behavior. And now I have 2 less files to worry about.

Tiny refactorings

In the previous code I used the "String.to_atom/1" to convert the string of the module name to an atom, to be later used in "apply/3" calls:

1
2
3
4
5
6
defp manga_source(source, module) do
  case source do
    "mangareader" -> String.to_atom("Elixir.ExMangaDownloadr.MangaReader.#{module}")
    "mangafox"    -> String.to_atom("Elixir.ExMangaDownloadr.Mangafox.#{module}")
  end
end

I changed it to this:

1
2
    "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}"
    "mangafox"    -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}"

It's just a shortcut to do the same thing.

In the parser I was also not using Floki correctly. So take a look at this piece of old code:

1
2
3
4
5
6
7
8
9
defp fetch_manga_title(html) do
  Floki.find(html, "#mangaproperties h1")
  |> Enum.map(fn {"h1", [], [title]} -> title end)
  |> Enum.at(0)
end
defp fetch_chapters(html) do
  Floki.find(html, "#listing a")
  |> Enum.map fn {"a", [{"href", url}], _} -> url end
end

Now using the better helper functions that Floki provides:

1
2
3
4
5
6
7
8
9
10
defp fetch_manga_title(html) do
  html
  |> Floki.find("#mangaproperties h1")
  |> Floki.text
end
defp fetch_chapters(html) do
  html
  |> Floki.find("#listing a")
  |> Floki.attribute("href")
end

This was a case of not reading the documentation as I should have. Much cleaner!

I did other bits of cleanup but I think this should cover the major changes. And finally, I bumped up the version to "1.0.0" as well!

1
2
3
4
5
   def project do
     [app: :ex_manga_downloadr,
-     version: "0.0.1",
+     version: "1.0.0",
      elixir: "~> 1.1",

And speaking of versions, I'm using Elixir 1.1 but pay attention as Elixir 1.2 is just around the corner and it brings some niceties. For example, that macro that aliased a few modules could be written this way now:

1
2
3
4
5
def mangareader do
  quote do
    alias ExMangaDownloadr.MangaReader.{IndexPage, ChapterPage, Page}
  end
end

And this is just 1 feature between many other syntax improvements and support for the newest Erlang R18. Keep an eye on both!

Comments

comentários deste blog disponibilizados por Disqus