In our first blog post on data deduplication best practices, we discussed our first best practice—considering the broader implications of deduplication. In this post, we’ll discuss two more deduplication best practices.

In the simplest terms, data created by humans—documents, transactions, and email for example—dedupes well in most dedupe systems. Photos, audio, video, imaging, or data created by computers generally don’t dedupe well, so you should store these sets of data on non-deduped storage. Learn what data does not dedupe well in your particular environment, and consider not deduping it. For some situations, you might consider a deduplication solution that can selectively avoid certain sets of data.

The length of time that data is retained affects data deduplication ratios in two ways: If more data is examined when deduplicating new data, you’re more likely to find duplicate data and increase space savings.
While you should closely examine this number when you’re comparing multiple products, try not to overanalyze this number once your system is up and running. Rather than performing more frequent full backups just to get a better data deduplication ratio, consider increasing your backup retention period for your on-disk data store. Once you have your first set of backups on disk, adding additional backups to that same deduped system will take up less space than sending them to tape.
Three down, two more to go… come back soon to see our final two data deduplication best practices. And as always, we value your feedback and ideas on how you deal with data deduplication.