Petabytes of Compliance Mail Journal Data - An Interesting Challenge

Here at Cloudficient, we’re all about helping enterprises solve their transformation challenges. Over the last several ...

Here at Cloudficient, we’re all about helping enterprises solve their transformation challenges. Over the last several months, we had a few, very large companies, asking for our opinion on how to solve an interesting challenge which they all had in common.

I mention explicitly: “large” and “few”, as this problem is unique to some of the largest enterprises in the world. In this first part of a two-part blog, I would like to introduce you to the overall challenge in question. In the final part we will focus on our suggested solution and how it will address the customers’ unique situation.

Finance illustration with a business city landscape and currency symbols

As we cannot disclose specific details, let’s establish an example customer - Contoso. Contoso's current situation and respective challenges are created based on discussions and input from the previously mentioned organizations – all are active Enterprise Vault customers with multiple petabytes of journal data looking for options to change their current approach.


Contoso is a globally acting financial institution, using Veritas Enterprise Vault for email journaling over many years with 135 deployed Enterprise Vault Servers. Contoso has accumulated 2.5 PB of data (compressed) in Enterprise Vault environments in various countries. The existing Enterprise Vault installation has exceeded its architected capacity and justifiable cost. As Contoso is a regulated business, supervision and SEC/FINRA compliance are essential.

Contoso is using SEC 17a-4 compliant object archiving storage (EMC ECS) for Enterprise Vault.

Veritas Compliance Accelerator provides supervision for broker-dealers. For eDiscovery, Contoso uses Veritas Discovery Accelerator to cull, filter, and produce a dataset from the journal archive for further investigation and legal case management in Nuix in the US and Relativity in EMEA.

What options does Contoso have? 

Keep everything as it is

Maintaining the status quo would mean to preserve the Enterprise Vault on-premises infrastructure and relevant support structures in place for years to come.  While there are several business and technical challenges with this, associated costs would probably be the most significant.  Currently, Contoso’s yearly TCO to maintain the Enterprise Vault environment is $12.5MThis includes everything from administration staff to associated software maintenance, and all the database, index, and storage (primary, backup, replication) infrastructure in between. This cost will inevitably grow as new data is added over time.

Migrate to a cloud archive

Data Migration on Green  Puzzle on White Background.

Many organizations choose to migrate to cloud archive providers (e.g. Proofpoint, Smarsh, Global Relay, etc.).  Unfortunately, even in this scenario, the original on-premises archiving infrastructure will need to be maintained until the migration is complete.

Based on the estimates provided by the cloud vendors, Contoso faces the possibility that the migration takes 4.5 years.

  • Estimated cost to migrate data to cloud archive of choice $10,240,000
    • Assumes $4,000 (per TB) migration costs for 2.5 PB
    • Cost includes any required EV extraction and cloud ingestion fees and services (fees depend on the cloud vendor, and are averaged out)
  • Estimated time to migrate data to cloud archive of choice – ~4.5 years
    • Assumes ingestion rate of 2.5 TB per day for 4 PB uncompressed
      • While extraction might happen faster, most cloud vendors need the customer to extract legacy data to PST Files and ship them to the cloud vendor per disk shipping
      • The PST Files need to be processed, messages deduplicated and assigned to custodian archives in the cloud system
      • Therefore, ingestion is the inevitable bottleneck in this data migration scenario
    • Based on Contoso’s yearly TCO, maintaining the Enterprise Vault environment for 4.5 years would cost $56.25M

Furthermore, some cloud vendors seem to charge incredibly high storage fees per year for stored data. Our customers saw prices of $3,500 per TB/year which compared to native cloud storage (e.g. AWS, Azure, etc.) or on-premises storage (Hitachi, Dell-EMC, etc.) seems ridiculously high.


My ConclusionFinger Presses Yellow Button  Opinion on Black Keyboard Background. Closeup View. Selective Focus.

Long-term storage and retrieval of archived corporate email/business records are crucial for the vast majority of enterprises (e.g., for historical reference) and often even mandatory to fulfill corporate governance and retention policies (e.g., legal events, eDiscovery).

As companies of all sizes undergo more and more cloud transformation over time, the desire to migrate legacy systems which reached their lifecycle and replace them with a suitable cloud system grows.

Despite certain advantages of storing large datasets in these systems, the costs and time to migrate and process them into new systems (that use more current technologies) constitute a significant challenge.  Aside from the simple physics of migrating data volumes typical of large enterprise organizations (e.g., extractions/ingestions, disk shipping, etc.), the legacy system must be maintained for governance purposes until the migration is complete. 

With transition timelines spanning multiple years resulting in a zero or negative ROI, unfortunately the business case is often not compelling. Even if the negative ROI is accepted and the move to the cloud is done, the challenge only grows over time and is not solved, and it arises latest with the next desired technology change.

Given the facts above and considering that new technology evolves faster and faster these days, integrating more AI assistance or better content classification for example, the challenge for the future becomes:

What happens if I want to switch the cloud provider in 5 years?

Stay tuned for the second and final part of this series where we will focus on our suggested solution to this unique situation as well as the long term challenge!

Read Part II

Similar posts