Home > Windows > Windows 2012 Data Deduplication

Windows 2012 Data Deduplication

Today is a day when Windows Server 2012 was released and it’s available for download. First new feature I wanted to test is Data Deduplication.

Theory

Data Deduplication in Windows Server 2012 is performed as background job, which runs by default every hour. This process runs when server is idle and it doesn’t eat all server’s free resources. One job can run per one volume. It can deduplicate/check 100GB/hour. It does deduplication of variable-sized chunks (32-128KB) of files. Nice picture is on the bottom of this page. By default it deduplicates files which were not accessed for more than 5 days. It doesn’t deduplicate following file types: aac, aif, aiff, asf, asx, au, avi, flac, jpeg, m3u, mid, midi, mov, mp1, mp2, mp3, mp4, mpa, mpe, mpeg, mpeg2, mpeg3, mpg, ogg, qt, qtw, ram, rm, rmi, rmvb, snd, swf, vob, wav, wax, wma, wmv, wvx, accdb, accde, accdr, accdt, docm, docx, dotm, dotx, pptm, potm, potx, ppam, ppsx, pptx, sldx, sldm, thmx, xlsx, xlsm, xltx, xltm, xlsb, xlam, xll, ace, arc, arj, bhx, b2, cab, gz, gzip, hpk, hqx, jar, lha, lzh, lzx, pak, pit, rar, sea, sit, sqz, tgz, uu, uue, z, zip, zoo.

Let’s play

This feature is not enabled by default. You need to enable it by wizard Add Roles and Features:

Adding Data Deduplication feature

Adding Data Deduplication feature

When this is done you can use all Data Deduplication Powershell Cmdlets. To use these cmdlets you need to run Powershell as Administrator :-).

Microsoft engineers gave us one tool DDPEVAL.exe which can do little check data on our drives and tell us expected deduplication savings. You can copy utility DDPEVAL.exe to other system and run it to check deduplication expected ratio before even trying deduplication.

So let’s copy some files on my new disk D:\ (System drives can not be deduplicated) and let’s run DDPEVAL.exe to see expected savings:

DDPEVAL.exe result

DDPEVAL.exe result

This result was run on couple Excel and Word files. When I run DDPEVAL.exe on my Music directory (all MP3s) I received Space savings 1%. So let’s enable data deduplication on disk D:

Enable-DedupVolume

Enable-DedupVolume

As I mentioned earlier data deduplication job runs every hour. Let’s look at dedupliction scheduled jobs:

Get-DedupSchedule

Get-DedupSchedule

Get-DedupSchedule is not really good cmdlet, because it doesn’t provide full information (even when you run | fl *) about correct schedule. When you check Task Scheduler you can see the correct schedule for

Deduplication schedule

Deduplication schedule

So I will not wait until tomorrow. I will run this task right now and check the result. When I ran it I could see CPU activity for a little bit of time.

Deduplication job initializing

Deduplication job initializing

Get-DedupMetadata result

Get-DedupMetadata result

Get-DedupStatus and Get-DedupVolume

Get-DedupStatus and Get-DedupVolume

So it looks that nothing was deduplicated. Answer is easy. I just copied data on disk, so their Last Access time is not older than 5 days. And if you remember Data Deduplication by default will not deduplicate files which were not accessed for less than 5 days. So let’s wait for 5 days 🙂 ….. Or better let’s change this 5 days value to something different. Let’s look on values set for volume:

Get-DedupVolume values

Get-DedupVolume values

It it’s set per volume setting. Let’s change it via Set-DedupVolume cmdlet:

Set-DedupVolume

Set-DedupVolume

So now I have eliminated deault value 5 days. Let’s run BackgroundOptimization again and look what will be the result.

Maual Deduplication

Maual Deduplication

And now when I look at the status of Metadata and Volume I see:

Data Deduplication result

Data Deduplication result

Real SavingRate is 5%, but DDPEVAL.exe counted is aproximately to 30% 🙂

If you want to disable deduplication on volume, you can do it running command

Disable data dedupliction

Disable data dedupliction

Metadata are not self-cleaning so there have to be two other tasks which cleans Metadata and take care of the data deduplication overhead. Those two tasks are called WeeklyGarbageCollection and WeeklyScrubbing. Their jobs are descibed here.

I think this new feature can be usefull in every customer I know. All of them save lots of data on fileserver which are not accessed for a long time (sometimes years) and they don’t want just delete these files and restore them from tapes when they are needed.

More information can be read from Storage Team blog.

  1. No comments yet.
  1. No trackbacks yet.