A product for identifying duplicate files was identified during the recent Data Deduplication WebForum.


Moderator: Our next topic involves the detection and reduction of duplicate files.

Dan S.: We found a program called SpaceObServer. It goes out and catalogs all of our file servers, which is about 20 TB of data, builds a database, and identifies duplicates within a server. We are looking to do that globally across our organization, and right now we are working on taking the data from SpaceObServer and writing your own reports and to grab those duplicates. We found 250 copies of the exact same spreadsheet, not altered in any way.  SpaceObServer does a good job of identifying that, but you know, what do you do from there? Do you go to your users and say, you have 250 of these, delete 249 of them? At one point, is there payback of having employees digging through all their folders to do it?

Moderator: So obviously, the Data Domain tool does not reach into this part of your enterprise.

Dan S.: Correct.

Moderator: So, SpaceObServer does this.

Dan S.: Yes. We bought it, but it is licensed per server, and the data is really server-centric, not organization-centric.

Moderator: So it just identifies the duplicate files and then you have to go in and manually take care of those copies?

Dan S.: Yes. From a policy standpoint, that is the tougher part. How do I go into people’s directories and delete all their copies? I have deleted, just on my own, over 2 TB of videos that I found out here. We are relatively open, so, you know, I find vacation videos, and I find vacation photos all the time. I will get 300 GB of data in one afternoon in people’s personal folders. You find out, oh, yes, they just came back from some place and just started downloading all their stuff.

Chris B.: Are your file servers running Windows Storage Server?

Dan S.: No.

Chris B.: If you were running Windows Storage Server, you can actually write rules to not allow those file types of be even saved on that server.

Dan S.: Our biggest problem is the fact that we use video. This is why all of our employees know how to upload video. We use video as part of our business. We also use digital photography as part of our business. So, I looked at that. I looked at limiting the amount of information people can store, and they say they need to do it for their jobs, so it is a lot of separating what is business and what is not.

Chris B.: That is always the tough one.

Moderator: Have you run into any problems with deleting something that someone got upset about?

Dan S.: Oh yes. We have an area; I call it the Wild West, because right now it is about 600 GB of shared area, where everybody throws stuff just to share with other people. I deleted somebody somebody’s Russian vacation, where they went to Russia for vacation, and you know, I got a call the next week going, my vacation pictures are gone. So, yes, it has happened.

