Writing Great Unit Tests: Best and Worst Practices和訳

うちの会社で単体テストを誰も教えてくれません。

でも、達人プログラマを読んでも、コードクラフトを読んでも、アジャイルソフトウェア開発の奥義を読んでも、リファクタリングを読んでも、コードコンプリートを読んでも、レガシーコード改善ガイドを読んでも"単体テストしろ！"と書いてます。

そこでとりあえずテスト用フレームワークをインストールして適当にテストを書いてみたのですがまさに

It’s overwhelmingly easy to write bad unit tests that add very little value 
to a project while inflating the cost of code changes astronomically.

になってしまいました。
そこで、とてもすばらしいブログ記事(Writing great unit test http://blog.stevensanderson.com/2009/08/24/writing-great-unit-tests-best-and-worst-practises/)を見つけたので、適当に和訳することにしました。

直訳よりは自分がわかりやすい意訳を優先してますが、英語力が低いので間違ってるところも多々あると思います。意味が致命的に違うところや、こいつ完全に勘違いしてるだろ的なところがあればご指摘いただけると幸いです。では早速

This blog post is aimed at developers with at least a small amount of unit testing experience. 
If you’ve never written a unit test, please read an introduction and have a go at it first.

この記事は、少なくともほんのちょっとは単体テストを書いたことがある人向けに書かれています。全く単体テストを書いたことがない人はまずこっちを読んでね。

http://www.amazon.co.jp/%E9%81%94%E4%BA%BA%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%9E%E3%83%BC%E2%80%95%E3%82%BD%E3%83%95%E3%83%88%E3%82%A6%E3%82%A7%E3%82%A2%E9%96%8B%E7%99%BA%E3%81%AB%E4%B8%8D%E5%8F%AF%E6%AC%A0%E3%81%AA%E5%9F%BA%E7%A4%8E%E7%9F%A5%E8%AD%98-%E3%83%90%E3%83%BC%E3%82%B8%E3%83%A7%E3%83%B3%E7%AE%A1%E7%90%86-%E3%83%A6%E3%83%8B%E3%83%83%E3%83%88%E3%83%86%E3%82%B9%E3%83%88-software-engineering/dp/475614599X/ref=sr_1_2?s=books&ie=UTF8&qid=1339939476&sr=1-2

What’s the difference between a good unit test and a bad one? 
How do you learn how to write good unit tests? 
It’s far from obvious. 
Even if you’re a brilliant coder with decades of experience, 
your existing knowledge and habits won’t automatically lead you to write good unit tests, 
because it’s a different kind of coding and most people start with 
unhelpful false assumptions about what unit tests are supposed to achieve.

良い単体テストと悪い単体テストの違いはなんでしょう？
良い単体テストの書き方はどのように学べばよいでしょうか？
それらは自明ではありません。
もしあなたが十分な経験をもった優秀なプログラマであったとしても、(新たに単体テストについて正しく学ぶことなしに)あなたの知識や習慣のみから良い単体テストを書くことは難しいでしょう。
普段のコーディングと、単体テストのコーディングは違う種類の作業であるのに、
多くの人は単体テストが何のためのものなのかを正しく理解しないまま始めてしまうからです。

Most of the unit tests I see are pretty unhelpful. 
I’m not blaming the developer: 
Usually, he or she just got told to start unit testing, 
so they installed NUnit and started churning out [Test] methods. 
Once they saw red and green lights, they assumed they were doing it correctly. 
Bad assumption! 
It’s overwhelmingly easy to write bad unit tests that add very little value 
to a project while inflating the cost of code changes astronomically. 
Does that sound agile to you?Unit testing is not about finding bugs

私が普段目にするほとんどの単体テストはおおよそ役に立たないものばかりです。
"単体テストを始めろ"っていわれたからNUnitをインストールして、[Test]メソッドを書き始めた人たちを非難しているわけではありません。
赤や緑のバーをみて、なんかうまいことテスト出来てると思っちゃうことを言っているのです。
あぁなんて勘違い！
プロジェクトにほとんどメリットを与えない割りにコード変更のコストをすごい勢いで上げちゃう糞単体テストを書くことは良い単体テストを書くより圧倒的に簡単なのです。
こんなのがアジャイルだと思いますか？

Unit testing is not about finding bugs
Now, I’m strongly in favour of unit testing, 
but only when you understand what role unit tests play within the Test Driven Development (TDD) process, 
and squash any misconception that unit tests have anything to do with testing for bugs.

単体テストの目的はバグを発見することじゃありません
私は単体テストには大賛成ですが、それはテスト駆動開発における単体テストの役割がちゃんと理解された上で行われるものに対してであって、単体テストがバグの発見となんか関係あるっていうそのふざけた幻想はぶっ潰します。

In my experience, unit tests are not an effective way to find bugs or detect regressions. 
Unit tests, by definition, examine each unit of your code separately. 
But when your application is run for real, all those units have to work together, 
and the whole is more complex and subtle than the sum of its independently-tested parts. 
Proving that components X and Y both work independently doesn’t prove that 
they’re compatible with one another or configured correctly. 
Also, defects in an individual component may bear no relationship to 
the symptoms an end user would experience and report. 
And since you’re designing the preconditions for your unit tests, 
they won’t ever detect problems triggered by preconditions that 
you didn’t anticipate 
(for example, if some unexpected IHttpModule interferes with incoming requests).

経験上、単体テストはバグやデグレの発見の方法としては不十分であります。
単体テストは、その定義からして、コードのユニットを別々にテストするものであります。
でも、実際にアプリケーションを実行するとき、それらのユニット達は共同で働かなきゃなくて、それら全体としての挙動は、別々にテストしたパーツの単純な集合としてよりも複雑で繊細なものになります。
コンポーネントXとYが両方とも独立に動作することを確かめたところで、それらが交換可能だったり、ただしく設定されてるということを保証はしません。
また、特定のコンポーネントの欠陥は、全く関係ないように思えるような現象としてエンドユーザの前に現れます。
あと、単体テストは前提条件が設けられてしまうため、事前に想定できなかった問題は検出できません。

So, if you’re trying to find bugs, 
it’s far more effective to actually run the whole application together as it will run in production, 
just like you naturally do when testing manually. 
If you automate this sort of testing in order to detect breakages when they happen in the future, 
it’s called integration testing and typically uses different techniques and technologies than unit testing. 
Don’t you want to use the most appropriate tool for each job?

したがって、もしバグを発見したいなら、手動でテストするときみたいに製品実行時のように全アプリケーションを実際に実行してみるほうが圧倒的に効果的です。
もしこの種のテストを将来的に発生しうるバグを発見するために自動化したいなら、それは結合テストと呼ばれるもので、単体テストとは異なる技術や手法を要します。
それぞれの作業でそれに適したツールを使いたいでしょ？

Goal 	Strongest technique
Finding bugs (things that don’t work as you want them to) 	
Manual testing (sometimes also automated integration tests)
Detecting regressions (things that used to work but have unexpectedly stopped working) 	
Automated integration tests (sometimes also manual testing, though　time-consuming)
Designing software components robustly 	
Unit testing (within the TDD process)

目的
バグ(意図したように動かないもの)の発見
有効な方法
手動テスト(または自動化された結合テスト)

目的
デグレ(前までは正しく動作してたけど正しく動作しなくなっちゃったもの)の発見
有効な方法
自動化された結合テスト(または時間がかかるけど手動テスト)

目的
堅牢なソフトウェアコンポーネントの設計
有効な方法
TDDプロセス中での単体テスト

(Note: there’s one exception where unit tests do effectively detect bugs. 
It’s when you’re refactoring, i.e., restructuring a unit’s code but without meaning to change its behaviour. 
In this case, unit tests can often tell you if the unit’s behaviour has changed.)

(注意：単体テストがバグ発見に有効な例外が一つあります。
それはリファクタリング(つまり、ユニットの挙動を変更せずにユニットのコードを変更する)しているときです。
このとき、単体テストはユニットの挙動が変わってしまったことを知らせてくれます。)

Well then, if unit testing isn’t about finding bugs, what is it about?
I bet you’ve heard the answer a hundred times already, but since the testing misconception stubbornly hangs on in developers’ minds, 
I’ll repeat the principle. As TDD gurus keep saying, 
“TDD is a design process, not a testing process”. 
Let me elaborate: 
TDD is a robust way of designing software components (“units”) interactively 
so that their behaviour is specified through unit tests. 
That’s all!

それじゃあ、単体テストはバグ発見のためじゃないなら、いったい何のためのものなのでしょうか？
この質問の答えはきっと、既になんども聞いたことがあるはずです。
しかし、テストに関する勘違いが根強く残っています。
原則を再び言います。
TDDのguruたちが言い続けているように、
"TDDは設計のプロセスであり、テストのプロセスではない"。
もう少し詳しく言わせてください。
TDDはソフトウェアコンポーネント(unit)が指定した挙動になるように、単体テストを使ってインタラクティブに設計するための堅牢な方法です。
ただそれだけなのです！

Good unit tests vs bad ones
TDD helps you to deliver software components that individually behave according to your design. A suite of good unit tests is immensely valuable: it documents your design, and makes it easier to refactor and expand your code while retaining a clear overview of each component’s behaviour. However, a suite of bad unit tests is immensely painful: it doesn’t prove anything clearly, and can severely inhibit your ability to refactor or alter your code in any way.

良い単体テストと悪い単体テスト
TDDはあなたが設計したように振舞う、独立なソフトウェアコンポーネントを作成する手助けをします。
ひとそろいの良い単体テストはものすごく役に立ちます。
それらが各コンポーネントの概要を明らかにしていれば、あなたの設計について説明してくれて、あなたのコードをリファクタリングしたり拡張したりしやすくしてくれます。
しかしながら、悪い単体テストがそろっていたって苦痛なだけです。それは何一つはっきりとは証明してくれないし、あなたのコードをリファクタリングしたり書き換えたりするのを邪魔します。

Where do your tests sit on the following scale?

http://blog.stevensanderson.com/wp-content/uploads/2009/08/image-thumb1.png

あなたのテストはこの図のどこに位置していますか？

Unit tests created through the TDD process naturally sit at the extreme left of this scale. 
They contain a lot of knowledge about the behaviour of a single unit of code.
If that unit’s behaviour changes, so must its unit tests, and vice-versa. 
But they don’t contain any knowledge or assumptions about other parts of your codebase, 
so changes to other parts of your codebase don’t make them start failing 
(and if yours do, that shows they aren’t true unit tests). 
Therefore they’re cheap to maintain, and as a development technique, 
TDD scales up to any size of project.

TDDのプロセスを通して作られた単体テストは、自然とこの図の左端になります。
それらはコードの一部分の振る舞いに関する多くの情報を含んでいます。
もし、コードのある部分の振る舞いが変わったら、それに対応する単体テストも変更しなきゃならないし、逆に単体テストを変更したら、対応するコードを変更しなきゃなりません。
一方、それらの単体テストはあなたのコードのほかの部分について何も知らないし、依存していないので、ほかの部分を変更したって単体テストの結果が変わることはありません(もし結果が変わっちゃったなら、そのテストは真の意味で単体テストではなかったってことです)。
だから、それらはメンテナンスしやすいのです。
そして、開発手法としてのTDDは、どんな規模のプロジェクトでも使用可能なのです。

At the other end of the scale, integration tests contain no knowledge about how your codebase is broken down into units, 
but instead make statements about how the whole system behaves towards an external user. 
They’re reasonably cheap to maintain 
(because no matter how you restructure the internal workings of your system, it needn’t affect an external observer) 
and they prove a great deal about what features are actually working today.

図の反対側(右端)の結合テストは、あなたのコードがどんな部品から組み立てられているのかについてはなにもしりません。そのかわり、システム全体が外部のユーザにたいしてどのように振舞うかということについて説明します。
それらは、あなたのシステムの内部の仕組みをどんなに変更しても外部への影響の仕方は変える必要がないから、メンテナンスも楽だし、今どの機能が実際に機能しているかってことについて多くのことを証明してくれます。

Anywhere in between, it’s unclear what assumptions you’re making and what you’re trying to prove. 
Refactoring might break these tests, or it might not, regardless of whether the end-user experience still works. 
Changing the external services you use (such as upgrading your database) might break these tests, 
or it might not, regardless of whether the end-user experience still works. 
Any small change to the internal workings of a single unit might force you to 
fix hundreds of seemingly unrelated hybrid tests, 
so they tend to consume a huge amount of maintenance time &#8211; sometimes in 
the region of 10 times longer than you spend maintaining the actual application code. 
And it’s frustrating because you know that adding more preconditions to make these hybrid tests go green doesn’t truly prove anything.

図の、単体テストと結合テストの間はどこも、何を保証しようとしていて、何を仮定しているのかはっきりしない中途半端テストです。
リファクタリングすれば、エンドユーザに対する挙動が変わってないかどうかにかかわらず、これらのテストの結果が変わってしまうかもしれませんし、変わらないかもしれません。
また、あなたが使用している外部サービスを変更(たとえば使ってるデータベースをアップグレードするとか)したりしても、エンドユーザに対する挙動が変わってないかどうかにかかわらず、これらのテストの結果が変わってしまうかもしれませんし、変わらないかもしれません。
内部の仕組みに関するどんな小さな部分への変更だって、その変更によって何百個も全然関係ないように見える中途半端テストを変更しなきゃなくなる可能性があります。
そんなわけで、こういうテストはメンテナンスにとっても時間がかかります。
下手をすると、本番コードの修正の10倍もテストの修正に時間がかかったりします。
あと、こういう中途半端テストを通すために前提条件を仮定すると実際は何も証明してないのと同じなのでイラっとします(?)

Tips for writing great unit tests
Enough vague discussion &#8211; time for some practical advice. 
Here’s some guidance for writing unit tests that sit snugly at Sweet Spot A 
on the preceding scale, and are virtuous in other ways too.

すばらしい単体テストを書くためのTips
漠然とした話はもう十分でしょう - ここからは実践的なアドバイスです。
前掲の図のSweet Spot Aにぴったりあてはまる単体テストを書くためのいくつかのガイダンスです。

    Make each test orthogonal (i.e., independent) to all the others
    Any given behaviour should be specified in one and only one test. 
Otherwise if you later change that behaviour, you’ll have to change multiple tests. 
The corollaries of this rule include:

各テストはほかのすべてのテストと直交(つまり独立に)させましょう。
どの振る舞いについても、ただ一つのテストで指定されているようにしましょう。
そうじゃないと、その振る舞いをあとで変更したとき、複数のテストを変更しなきゃならなくなります。
このルールに含まれる系は...

        i>Don’t make unnecessary assertions
        Which specific behaviour are you testing? 
It’s counterproductive to Assert() anything that’s also asserted by another test: 
it just increases the frequency of pointless failures without improving unit test coverage one bit. 
This also applies to unnecessary Verify() calls &#8211; 
if it isn’t the core behaviour under test, then stop making observations about it! 
Sometimes, TDD folks express this by saying “have only one logical assertion per test”.
        Remember, unit tests are a design specification of how a certain behaviour should work, 
not a list of observations of everything the code happens to do.

不要なアサーションは作らないようにしましょう
どの振る舞いに関してテストしてるんですか？
ほかのテストでも確認していることをAssert()するのは非生産的です。
それは、単体テストのカバレッジをこれっぽっちも増やさないし、意味無いテスト失敗が増えるだけです。
これは、不要なVerify()の呼び出しにも同じことが言えます。
テストで着目している振る舞い以外に目をくれちゃいけません。
TDDの人たちはこのことを、"一つのテストに一つの論理的アサーション"って言ったりします。
単体テストは、特定の振る舞いがどう働くべきかっていう設計仕様であって、コードを実行したら何が起こるかの全リストじゃないってことを覚えておきましょうね。

        Test only one code unit at a time
        Your architecture must support testing units (i.e., classes or very small groups of classes) 
independently, not all chained together. 
Otherwise, you have lots of overlap between tests, so changes to one unit can cascade outwards and cause failures everywhere.
        If you can’t do this, then your architecture is limiting your work’s quality &#8211; consider using Inversion of Control.

一度に一つの部分だけテストしましょう。
あなたのアーキテクチャはユニット(つまりクラスか、数個のクラスの集合)の独立なテストをサポートしてなきゃならず、全部を結び付けちゃいけません。(→凝集度が高く結合度が低いコードを書こうねってことです)
そうじゃないと、テスト間でおおくの重複ができちゃって、一つのユニットに対する変更が別のユニットに対するテストに波及しちゃって、いろんなところでテストの結果が変わっちゃうことになります。
そうできないなら設計に問題がありそうです。
制御の反転(Inversion of Control)の使用を考えましょう。

        Mock out all external services and state
        Otherwise, behaviour in those external services overlaps multiple tests, 
and state data means that different unit tests can influence each other’s outcome.
        You’ve definitely taken a wrong turn if you have to run your tests in a specific order, 
or if they only work when your database or network connection is active.
        (By the way, sometimes your architecture might mean your code touches static variables during unit tests. 
Avoid this if you can, but if you can’t, at least make sure each test resets 
the relevant statics to a known state before it runs.)

すべての外部サービスと状態をモックにしよう。
そうじゃないと、それらの外部サービスの振る舞いが複数のテストで重複するし、状態のデータはある単体テストが、別の単体テストの結果に影響をあたえることを意味します。
もしあなたのテストをある決まった順で実行しなきゃならなかったり、実際にデータベースやネットワークと接続しなきゃならないなら、それはあなたが道を間違ったことを表しています。
(ところで、あなたのアーキテクチャは単体テスト時に静的変数にアクセスするかもしれません。
可能ならばそうならないようにしましょう。
どうしても無理なら、せめて各テスト実行前に関連がある静的変数を既知の状態にリセットしておきましょう)

        Avoid unnecessary preconditions
        Avoid having common setup code that runs at the beginning of lots of unrelated tests. 
Otherwise, it’s unclear what assumptions each test relies on, and indicates that you’re not testing just a single unit.
        An exception: Sometimes I find it useful to have a common setup method 
shared by a very small number of unit tests (a handful at the most) but only 
if all those tests require all of those preconditions. 
This is related to the context-specification unit testing pattern, 
but still risks getting unmaintainable if you try to reuse the same setup code for a wide range of tests.

不要な前提条件は避けましょう
多くの関係ないテストのはじめに実行される共通のセットアップコードを持つのはやめましょう。
そうじゃないと、各テストがどの仮定に依存してるかが不明瞭になるし、単一のユニットをテストしていないことになります。
例外：ごく少ない(多くても片手で数えられるくらい)の単体テストで共通のセットアップメソッドを共有するのは有効なときもあります。
しかし、それらのテストがすべてこれらの前提条件を必要としているときに限ります。
これはcontext-specification単体テストパターンに関係しますが、同じセットアップコードを広い範囲で再利用しようとするとメンテナンスができなくなる危険性はあります。

    (By the way, I wouldn’t count pushing multiple data points through the 
same test (e.g., using NUnit’s [TestCase] API) as violating this orthogonality rule. 
The test runner may display multiple failures if something changes, 
but it’s still only one test method to maintain, so that’s fine.)

(ところで、同じテスト(たとえばNUnitの[TestCase]APIを使うときとか)で複数のデータポイントをpushすること(?)をこの直交ルールの違反とはみなしません。
もし何か変わったら、テストランナーは複数の失敗を表示するだろうけど、メンテナンスしなきゃならないのは一個のテストだけなので、問題ないからです)

    Don’t unit-test configuration settings
    By definition, your configuration settings aren’t part of any unit of code 
(that’s why you extracted the setting out of your unit’s code). 
Even if you could write a unit test that inspects your configuration, 
it merely forces you to specify the same configuration in an additional redundant location. 
Congratulations: it proves that you can copy and paste!

単体テストのコンフィグ設定をしないようにしましょう
その定義からして、コンフィグ設定はどのユニットにも含まれません(だからそれらのコードから分離したんでしょ！）
コンフィグを調査する単体テストを書けたとしても、それは単にあなたに他のadditionalでredundantな場所で同じコンフィグをしていすることを強制するだけです。
おめでとう！そのテストは君がコピーアンドペーストすることが出来るってことを保証してくれるよ！

    Personally I regard the use of things like filters in ASP.NET MVC as being configuration. 
Filters like [Authorize] or [RequiresSsl] are configuration options baked into the code. 
By all means write an integration test for the externally-observable behaviour, 
but it’s meaningless to try unit testing for the filter attribute’s presence in your source code &#8211; 
it just proves that you can copy and paste again. 
That doesn’t help you to design anything, andn it won’t ever detect any defects.

個人的に、ASP.NET MVCのfilterみたいなのはコンフィグだとみなします。
[Authorize]や[RequiresSsl]みたいなフィルタはコード中に焼き付けられたコンフィグです。
もちろん、外部から観測できる部分に関する結合テストは書くけど、ソースコード中に存在するfilter attributeに対する単体テストを書こうとしても意味無いです。
それはただ、君がコピーアンドペーストできるってことをまた証明してくれるだけです。
それはデザインの助けにも、欠陥の発見にも役に立ちません。

    Name your unit tests clearly and consistently
    If you’re testing how ProductController’s Purchase action behaves when stock is zero, 
then maybe have a test fixture class called PurchasingTests with a unit test 
called ProductPurchaseAction_IfStockIsZero_RendersOutOfStockView(). 
This name describes the subject (ProductController’s Purchase action), 
the scenario (stock is zero), and the result (renders “out of stock” view). 
I don’t know whether there’s an existing name for this naming pattern, 
though I know others follow it. How about S/S/R?

単体テストに明解で一貫した名前をつけよう。
ProductControllerのPurchaseアクションが、在庫がないときどう振舞うかをテストしているとき、PurchasingTestって名前のテストフィクスチャクラスの中にProductPurchaseAction_IfStockIsZero_RendersOutOfStockView()って単体テストを作るでしょう。
この名前は対象(ProductControllerのPurchaseアクション)と、シナリオ(在庫がない)と、結果("在庫切れ"viewを表示する)ってのを記述しています。

    Avoid non-descriptive unit tests names such as Purchase() or OutOfStock(). 
Maintenance is hard if you don’t know what you’re trying to maintain.

Purchase()やOutOfStock()みたいな非記述的な名前は避けましょう。
何をメンテナンスしようとしてるかわかんないとき、メンテナンスが大変です。

Conclusion

Without doubt, unit testing can significantly increase the quality of your project. 
Many in our industry claim that any unit tests are better than none, but I disagree: 
a test suite can be a great asset, or it can be a great burden that contributes little. 
It depends on the quality of those tests, which seems to be determined by how 
well its developers have understood the goals and principles of unit testing.

結論

疑いようが無く、単体テストはあなたのプロジェクトの品質を劇的に高めます。
われわれの業界の多くの人が単体テストは書かないより書いたほうがいいって主張していますが、私はその考えに賛成しません。
テストスーツはすばらしいアサートにもなれば、ほとんど役に立たないひどい重荷にもなり得ます。
テストスーツがアサートになるか重荷になるかは、それぞれのテストのクオリティによってきまり、テストのクオリティは、開発者が単体テストの目的と原則をよく理解しているかどうかによって決まるようです。

By the way, if you want to read up on integration testing (to complement your unit testing skills), 
check out projects such as Watin, Selenium, and even the ASP.NET MVC 
integration testing helper library I published recently.

ところで、もし(あなたの単体テストのスキルを補足するために)結合テストについて知りたければ、WatinやSeleniumみたいなプロジェクトをチェックアウトしてみてください。
あと、ASP.NET MVC integration testing helper libraryについて最近記事を書いたのでそちらもみてみてねー。