Instant Data Masking in RDBMS

Introduction

Data Masking involves replacing real world sensitive data with synthetic replacements.    If your sensitive data is scattered across several tables consisting of thousands, millions or even billions of rows that requires a lot of changes to replace all those values.

Customers often ask if it is possible to reduce the time and resources to mask data.  You can reduce the time running masking jobs in parallel but this expands overall resource consumption.   The only way to reduce time and resource is to do less work.

One of the benefits of a relational database (RDBMS) is that when data is normalised (de-duplicated) updates to data are very efficient because you only have to update a given value once.  In this blog article I suggest that all tables containing sensitive data should be normalised by extracting sensitive data into lookup tables and replacing the original data with references.   There are several benefits to this model:

  1. Provisioning masked clones is faster and more resource efficient
  2. The Production data is easier to secure
  3. The Production database and its clones become more performant

A Simple Example

Let’s take a simple example of a CUSTOMER table which has data of the form:

Screen Shot 2018-08-24 at 17.06.28

Let’s query the first few rows of this table

CUST_ID CUST_FIRSTNAME CUST_LASTNAME  CUST_SSN
------- -------------- -------------- ---------
      1 Mary           Smith          PL898474B
      2 Patricia       Johnson        RD993299B
      3 Linda          Williams       LU785983C
      4 Barbara        Jones          XK295759D
      5 Elizabeth      Brown          QA006325A
      6 Jennifer       Davis          KQ616191C
      7 Maria          Miller         EG308056D
      8 Susan          Wilson         XP917622A
      9 Margaret       Moore          ZH212274D
     10 Dorothy        Taylor         TX804130B

To make this table secure you should build a secure version of the table which replaces all embeded sensitive literal values with references, which are resolved by lookups on tables which contain the actual values.

Our new secure version of the table now looks like

Screen Shot 2018-08-24 at 17.08.21

CUST_ID CUST_FIRSTNAME_ID CUST_LASTNAME_ID CUST_SSN_ID
------- ----------------- ---------------- -----------
      1             1070             2083        3186
      2             1078             2044        3210
      3             1060             2095        3144
      4             1010             2045        3279
      5             1030             2010        3192
      6             1043             2021        3135
      7             1066             2054        3057
      8             1093             2096        3282
      9             1065             2056        3294
     10             1029             2085        3240

That looks a lot more secure.  The lookup or dimension tables look as follows:

Screen Shot 2018-08-24 at 17.09.50

Sample from FIRSTNAME_DIM table

FIRSTNAME_ID FIRSTNAME
------------ ---------
        1007 Anne
        1013 Bonnie
        1036 Helen
        1045 Joan
        1056 Kelly
        1073 Mildred
        1078 Patricia

Screen Shot 2018-08-24 at 17.12.25

Sample from LASTNAME_DIM table

LASTNAME_ID LASTNAME
----------- ---------
       2023 Edwards
       2025 Flores
       2055 Mitchell
       2063 Perez
       2068 Price
       2077 Ross
       2088 Torres
       2089 Turner

Screen Shot 2018-08-24 at 17.13.34

Sample from SSN_DIM table

    SSN_ID SSN
---------- ---------
      3012 AW876543D
      3033 CC929313D
      3057 EG308056D
      3135 KQ616191C
      3147 MK525013B
      3192 QA006325A
      3207 QO371839A

We have now built a cluster of tables which looks like:

Table_Cluster

To expand our secure table to display the data as stored in the original we need a database view which pulls the data together.

create or replace view CUSTOMER_VIEW
as
select csd.CUST_ID, fnd.FIRSTNAME CUST_FIRSTNAME, lnd.LASTNAME CUST_LASTNAME, snd.SSN CUST_SSN
from CUSTOMER_SEC  csd
join FIRSTNAME_DIM fnd on csd.CUST_FIRSTNAME_ID = fnd.FIRSTNAME_ID
join LASTNAME_DIM  lnd on csd.CUST_LASTNAME_ID = lnd.LASTNAME_ID
join SSN_DIM       snd on csd.CUST_SSN_ID = snd.SSN_ID;

When we query from this view we get the fully expanded version.

SQL> select * from CUSTOMER_VIEW sample(10);

CUST_ID CUST_FIRSTNAME CUST_LASTNAME  CUST_SSN
------- -------------- -------------- ---------
     32 Amy            Lopez          LH062496C
     18 Carol          Garcia         NS733964A
     39 Debra          Nelson         TL389654A
     10 Dorothy        Taylor         TX804130B
     57 Evelyn         Morgan         NQ870196D
     86 Jacqueline     Long           GT430822C
     58 Jean           Bell           HG462950A
      6 Jennifer       Davis          KQ616191C
     36 Kathleen       Adams          MP278347B
     98 Lillian        Griffin        OT351086A
      3 Linda          Williams       LU785983C
     37 Pamela         Baker          GF089913C

Steps to create a secure CUSTOMER table cluster

  1. Rename table CUSTOMER to CUSTOMER_ORIG
  2. For each column containing secure data create dimension (lookup) tables with a unique list of values for the column and a lookup id. You should create a primary key on the ID and a single unique index on the value and ID columns.
  3. Build a secure version of table CUSTOMER, called CUSTOMER_SEC replacing literal secure values with IDs from their corresponding dimension tables. You can add foreign keys between CUSTOMER_SEC and the lookup tables.
  4. Create a database view which presents the data from CUSTOMER_SEC as it would have looked when querying the original CUSTOMER table.
  5. Drop table CUSTOMER_ORIG
  6. Create a synonym called CUSTOMER on view CUSTOMER_VIEW

That takes care of querying. For DML you will need to create triggers that will translate INSERT, DELETE and UPDATE requests into the corresponding DML for the cluster of tables.

 

Increase Production Database Security

At this stage it looks like we have just made it more complex to access the data that was originally held in the CUSTOMER table.  Well we have but that is a good thing.  We can now more easily improve production database security by doing the following:

  1. Encrypt the dimension/lookup tables

There is no need to encrypt any other data in the database since we have isolated the secure data.  This avoids encrypting the entire database simply because we do not know where the secure data resides.  This also has some performance benefits as the majority of data accesses will not require decryption.

  1. Enable auditing on access to those tables

Auditing on tables is an overhead but by isolating the secure data we have minimised the audit activity.

  1. Ensure Redaction policies on all value data in the dimension/lookup tables

Again by isolating the secure data we now know exactly which data needs to have Redaction policies applied.  Redaction hides sensitive information from production database users based on role based access.

 

Near Instant data masking

To produce a masked version of a database with the above changes we only need to provide masked versions of the domain/lookup tables and drop the original tables.   We could build these replacement tables from scratch, this process will still be faster than masking the original format as there is less data to scan (shorter rows) and fewer rows to mask due to de-duplication.

Incremental masking would also be easier if used we versioning of masked tables and mask only the necessary deltas.

 

Objection … overruled?

The obvious concern is that this approach requires schema changes and potentially application changes. There is no escaping that and it means that this approach to security is likely only applicable to custom applications where the schema is under your control.

The good news though is that usually only a small fraction of tables in a schema contain sensitive data and so the amount of effort may be far less than anticipated.

 

Summary

The benefits are immense, more secure Production Databases and faster, potentially near instant, provisioning of secure masked databases.  Both production and masked databases will also likely perform faster for any scan based operations due to the implicit compression afforded by replacing literal values with numeric identifiers.

This approach improves data security, performance and uses less resources.

Advertisements

Delphix Data Transport Shuttle

 Move, Copy or Replicate Data – sounds easy?

In IT, it often transpires that tasks that appear difficult turn out to be quite easy, clearly though not often enough.  Conversely tasks that appear to be easy, all too often, turn out to be rather difficult.   Moving, copying or replicating data between data centres, including Public Clouds turns out to be more challenging than it first appears.

Of course, there are many point solutions.  It is easy to transfer or copy files from server to server, databases have physical and logical replication, you can use ETL to logically synchronise data between two data repositories and storage replication is commonly used between data centres which have the same storage and High Availability primary use case.  You can also physically transport a backup copy of your data and restore it at the target end.

The problem is that few of these solutions are fully automated, they require custom configuration for each source, for storage replication you need to have the same storage at each end and physical transportation is a manual process which is only suitable for discrete copies and not for continuous replication or synchronisation.

Delphix’ best kept Secret:  Delphix Replication

Delphix Replication allows you to easily transport and synchronise any data between two Delphix Engines regardless of where the Delphix Engines are deployed.   This could be on-premise to on-premise, on-premise to cloud (Azure, AWS, IBM Cloud), cloud to cloud or cloud to on-premise.

Delphix Replication is also bi-directional.

Did you say any data?

Yes.  Delphix supports a variety of Unix , Linux and Windows platforms.  Another of our open secrets is you can push any data, documents, files, folder, backup images, database files, almost anything running on any of Delphix supported platforms directly into Delphix and that data can then leverage Delphix Data Plaform capabilities implicitly.  The Delphix data copy (dSource) is compressed, de-duplicated and versioned regardless of how the data is ingested.  Copies can then be provisioned from any point in time, included in a Data Pod with full self-service or the data can be replicated to another Delphix Engine.

Does it have to Virtualised?

No.  Once a Delphix Data Copy (dSource) is created, any point in time copy can be created as either a virtual copy or a physical copy.

A single fully automated data transport solution

Delphix Replication can act as a data shuttle.   Delphix can automatically ingest the data or you can push data into Delphix on your own schedule.  Once Delphix has a copy it can automatically transport any selected data between data centres or clouds and enable self-service provisioning of physical or virtual copies locally or remotely.

You can put all the data you need to transport between data centres or clouds into Delphix.

Does it require high bandwidth?

No, you can use what our Support team refer to as a “wet piece of string”.   For example we have a customer using Delphix Replication between Hong Kong and China over a 50 Mbps link and another replicating between Argentina and the over a 30 Mbps link.  It’s a matter of physics the higher the bandwidth and lower the latency the better the throughput.

The data is already compressed and is encrypted when transported using the Delphix Session Protocol (DSP).

It’s not an option

What I mean is, it is not an extra cost option, Delphix Replication is free with any two or more Delphix Engines.  It is available with a single Delphix Engine and does work but is not nearly as useful in such a deployment.

To Summarize

Replicating data between data centres and clouds is currently plagued by a variety of point solutions.  It is a non-trivial task and is increasingly required as customers start spreading workload between data centres and clouds, not just for high availability but for scaling workload, enriching data by tapping into remote services and improved vendor support.

Delphix simplifies data transportation and as a Data Platform makes the entire process consistent, simple, safe and easy to manage with out of the box services including compression, de-duplication, replication, encryption and virtualisation.

If you have data on any of Delphix’ supported Operating Systems, Linux (RHEL, CentOS, OEL, SUSE), Solaris, HP-UX, IBM AIX or Windows, that you want to easily send or synchronise to another data centre or cloud, try Delphix Replication.

 

 

 

 

Delphix Data Virtualisation enabled Data Masking

GDPR

How Do You Solve A Problem Like… Masking?

Well, as in the film, for you youngsters “The Sound of Music”, we could burst into song but most customers are more inclined to burst into tears.

Providing secure data for non-production use is an expensive exercise whether you synthesize or mask (obfuscate) data.  Encryption is much easier but not really an option for non-production use since it is reversible.

In addition, GDPR has raised the priority of securing non-production data with more binding legal requirements and stiffer penalties for non-compliance.

Delphix can help with self-funded, simplified masking and the ability to easily leverage lower cost cloud resources using masked data.  In this article, I outline how Delphix Data Virtualisation not only streamlines data masking but can also help fund it as well.

Challenge 1: I need a database copy to mask

The first challenge for a data masking exercise is you need a copy of the data to mask.  If your database is multi-terabyte where are you going to place that copy?  It has to be in the production zone because until it is masked it contains sensitive data.  If you have space for a full copy but you need to refresh that copy you will likely need to overwrite the previous copy and deny access to existing users for the duration of the copy and mask process.

Solution 1: Virtual Data Provisioning and Versioning

Delphix Data Virtualisation allows you to create a full read write copy of an entire database with minimal storage, almost zero overhead for the initial copy.  Better still Delphix Data Virtualisation allows you to create a new version without impacting previous versions allowing continuous production of new masked versions and uninterrupted access to previous versions.

Challenge 2:  I need to refresh several environments with the latest copy

With physical copies you need to copy the entire newly masked database to each environment and as with the original provisioning you will likely need to overwrite the previous copy and deny access to existing uses for the duration of the copy process.

 

Masking Pipeline 2

Secure Data Generation Pipeline and Consumption

Solution 2: Virtual Data Child Copies, Refresh, Branching and Versioning

Once you have masked your virtual data copy you can then create as many virtual child copies as required, all now sharing the set of blocks that were modified as a result of the data masking process, typically between 10-15% of data blocks.

Delphix Data Virtualisation allows you to refresh an environment on demand.  Even after refreshing your database with the latest masked copy, the previous versions are still accessible and you can switch between versions on demand.

Challenge 3:  I need to provision several masked databases at a consistent point in time

The first challenge is you need a masking solution that masks consistently across databases even from different vendors.  You also require to be able to provision the initial set of databases for masking to a consistent point in time.

Solution 3: Virtual Data Point in Time Provisioning and Data Pods

Delphix Data Virtualisation allows you to provision multiple databases from the same or different vendors to the same point in time.  These virtual copies can be grouped into Data Pods which are assigned to projects or individuals.  All self-service operations on the Data Pods including, start, stop, rewind, refresh, bookmark, share and switch version automatically apply to all the components in the Data Pod.

asset_73x

 

Challenge 4:  I need to get my masked data onto a Cloud

One of the commercial benefits of data masking is that it enables customers to leverage lower cost cloud resources using secured data.   Unfortunately, it is not always that easy to upload data to the cloud and keep the cloud copy updated with refreshed versions.  Often this requires setting up some form of replication or backup and restore process for each database.

Solution 4: Delphix Replication

Delphix Replication allows you to replicate only masked database copies between Delphix Engines, which can either be on premise or on cloud.   The replication process is completely automated and new database versions only require compressed blocks to be sent to keep the target engine in sync.

Challenge 5:  Cost of non-production copies and data masking

Data Masking adds additional license, processing and operating cost.

Solution 5: Delphix Data Virtualisation

Delphix Data Virtualisation will typically reduce your non-production storage footprint by 70-90%.   In addition, Delphix turnkey, end to end automation reduces operational costs and accelerates development and test cycles.   This combination of savings will typically fund your Delphix Virtualisation and Data Masking investments in 6-9 months.

To Summarize

Data Masking is a great example of how Delphix Data Virtualisation can streamline and accelerate the flow of data for use in non-production whether on premise or on cloud.   In this article, we have not even looked at the benefits of Delphix Data Masking

Any customer performing Data Masking using any Data Masking solution will benefit from Delphix Data Virtualisation.

 

 

 

Delphix enabled Logical Data Warehouse

ManyDatabase

Image courtesy of Business Cloud News

What is a Logical Data Warehouse?

A Logical Data Warehouse uses federated queries to query source databases, and other data sources, directly, avoiding the need to copy data into a dedicated data warehouse. The key benefits to this approach over traditional data warehouse solutions are:

  1. Access the latest version of the data, not a copy from the night before or last weekend
  2. Easily add new Data Sources and adjust data models on existing data sources much more quickly
  3. Reduced Infrastructure, decreases the amount of server capacity and storage required because you leverage the source environments.

A Logical Data Warehouse is often called a Virtual Data Warehouse or a Federated Data Warehouse.

To Copy or not to Copy, that is the question

As mentioned there are two main approaches to provide access to data for analytics purposes.  Either you copy the data to an analytics or data warehouse platform or you keep the data in place and implement a unified data model and use a set of connectors to query the data directly from their sources.

There is a variation of the latter, mainly OLAP solutions, where pre-calculated query results are materialized into views or multi-dimensional cubes for improving response times for predicted queries.

Solutions that require their own copy of the data, include Exadata, Teradata, Netezza, Hadoop, Snowflake, Amazon Redshift, Azure SQL Data Warehouse and Google BigQuery.

Logical (or Virtual) Data Warehouse vendors include Denodo, Cisco / Composite, Red Hat JBoss Data Virtualisation, DataVirtuality, and SAP Hana.

Similar to Delphix which avoids all the costs and delays in provisioning copies of data, a Logical Data Warehouse (LDW) avoids all the costs and delays in copying data but still needs continuous access to the source data.  With Delphix the opportunity exists to build a Logical Data Warehouse on top of Virtual Databases.  Why would you want to do that?

 

Virtual Data Sources for Logical Data Warehouses

Using a logical data warehouse customers may be concerned about performance hits on mission-critical source systems being exposed directly to logical data warehouse queries.

Connecting your logical data warehouse to Delphix Virtual Databases has the following advantages:

  1. Protect Production, Delphix offloads production by provisioning virtual data copies on separate database servers using shared storage managed by Delphix.
  2. Enhance Performance, run as many virtual databases as required to spread the load with no storage overhead.
  3. Enhance Security, Delphix can provision original or secured (consistently masked) virtual database copies avoiding the need and cost of masking an entire warehouse data copy.
  4. Time Travel, Delphix can provision consistent database sets from any point in time within the configurable retention period, enabling much simpler period to period based analytics over much smaller data sets.
  5. Reduced Infrastructure, leverage under-utilized but licensed, non-production servers.

You probably already have sufficient licensed database servers to accommodate a data warehouse solution without building or renting additional infrastructure or purchasing additional licenses.

  1. Ground or Cloud, Delphix can run in your data center or on AWS or Azure. Delphix replication can allow you to synchronize your data in the cloud allowing you to run your Logical Data Warehouse against co-located database copies.

 

To Summarize

Delphix is an outstanding complement to Logical Data Warehouse solutions, in particular for provisioning high performance, secure, virtual copies of your most important data.

 

 

Data Masking: Locating Sensitive Data

Data Masking is a Sensitive Subject

One of the primary concerns with Data Masking is identifying the locations of sensitive data within a database. For custom applications, the development team will likely know where all sensitive data is stored, however with third party applications this is a bigger challenge, especially considering some database schemas can have 10s of thousands of tables.

Delphix Masking has a Profiling tool which uses regular expressions to locate potentially sensitive data based on the column name or a sample of the data itself. This is easily extended to add patterns for any custom formatted sensitive data and the profiling tool can then be used to find any additional occurrences of that data in any supported database or file type.

The profiling tool cannot always locate all sensitive data so an additional technique may help.

Customers are always right and sometimes Innovative

Discussing this issue recently with David Battiston from Worldpay, I suggested we could create a Delphix VDB database copy, and run functional tests covering sensitive transactions such as customer account creation, modification, ordering, or for a web based application any screen using https. We would use sentinel data in the test so we could easily spot it in the database. I proposed using a text index on the database but David suggested using database auditing.

What a great idea, I decided to give it a try.

Test Process

I downloaded the Community Edition of an eCommerce application called Konakart.
I created an Oracle Virtual Database as the database to underpin Konakart and created database user KKART to own the Konakart schema.

-- Create database user KKART to own Konakart schema

CREATE USER kkart IDENTIFIED BY kkart
    DEFAULT TABLESPACE users
  TEMPORARY TABLESPACE temp
    QUOTA UNLIMITED ON users;

GRANT connect,resource TO kkart;

 

I then installed the Konakart application which populated the VDB with the Konakart schema and some demo data. The Konakart schema has around 128 tables.

I then enabled database auditing in the Oracle VDB.

-- Enable extended database auditing

ALTER SYSTEM SET audit_trail=db_extended SCOPE=SPFILE;
shutdown
startup

-- Enable audit for database schema owner KKART
AUDIT ALL BY kkart BY ACCESS;
AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE,
      DELETE TABLE BY kkart BY ACCESS;
AUDIT EXECUTE PROCEDURE BY kkart BY ACCESS;

 

Insert Sentinel Data

Using the Konakart application I added new customer details and placed an order.

Below is a summary of the details.

Screen Shot 2017-08-09 at 15.26.54

Looks like Donald has just purchased a printer!

Locate Sentinel Data

Now we just have to query the audit tables to locate where these details have been recorded.

select SQL_TEXT, SQL_BIND
  from dba_common_audit_trail
 where OBJECT_SCHEMA = 'KKART'
   and TRANSACTIONID is not null /* failed */
   and SQL_BIND like '%123 Washington Road%'; /* sentinel data */

And the result was two tables, ADDRESS_BOOK and ORDERS and quite a few columns.

INSERT INTO address_book
(address_book_id, entry_email_address, entry_company, entry_city,  entry_lastname, entry_firstname, entry_gender, entry_telephone,
 entry_telephone_1, entry_suburb, entry_zone_id, entry_country_id, entry_street_address_1, entry_postcode, entry_street_address,customers_id)
 VALUES (:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 ,:16 )

#4 entry_city = Washington
#5 entry_lastname = Trump
#6 entry_firstname = Donald
#8 entry_telephone = 555-666-1234
#14 entry_postcode = 666
#15 entry_street_address = 123 Washington Road

 

INSERT INTO orders
(billing_telephone, customers_telephone,customers_locale,
 orders_status, lifecycle_id, delivery_country,
 filterdecimal1, points_reservation_id, orders_id,
 billing_name, customers_name, billing_street_address,
 billing_country, delivery_name, billing_city, date_purchased,
 billing_state, customers_city, points_used, currency,
 parent_id, delivery_addr_id, billing_postcode,
 delivery_postcode, payment_module_code,
 billing_address_format_id, payment_method, delivery_city,
 customers_country, customers_state, customers_id,
 customer_addr_id,points_awarded, customers_postcode,
 last_modified, customers_email_address, currency_value,
 delivery_telephone, customers_street_address, delivery_state,
 delivery_address_format_id, delivery_street_address,
 shipping_module_code, billing_addr_id, orders_number,
 customers_address_format_id) VALUES (:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,
  :7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 ,:16 ,:17 ,:18 ,:19 ,
  :20 ,:21 ,:22 ,:23 ,:24 ,:25 ,:26 ,:27 ,:28 ,:29 ,:30 ,:31 ,
  :32 ,:33 ,:34 ,:35 ,:36 ,:37 ,:38 ,:39 ,:40 ,:41 ,:42 ,:43 ,
  :44 ,:45 ,:46 )


#1 billing_telephone = 555-666-1234
#2 customers_telephone = 555-666-1234
#6 delivery_country = United States
#10 billing_name = Donald Trump
#11 customers_name = Donald Trump
#12 billing_street_address = 123 Washington Road
#13 billing_country = United States
#14 delivery_name = Donald Trump
#15 billing_city = Washington
#17 billing_state = Washington
#18 customers_city = Washington
#23 billing_postcode = 666
#24 delivery_postcode = 666
#28 delivery_city = Washington
#29 customers_country = United States
#30 customers_state = Washington
#34 customers_postcode = 666
#36 customers_email_address = djtrump@whitehouse.com
#38 delivery_telephone = 555-666-1234
#39 customers_street_address = 123 Washington Road
#40 delivery_state = Washington
#42 delivery_street_address = 123 Washington Road

  

So Konakart is unusual by using clear column and table names but that is not the case for many third-party applications. Using this technique, we can semi automate the detection of sensitive data regardless of the database schema complexity or naming conventions.

To Summarize

The first challenge with data masking is getting a hold of a full copy of the database to be masked. A Delphix Virtual Database gives you a personal pop up full copy in minutes.

The second challenge with data masking is which columns to mask. If you do not already know you could use a technique as above using sentinel data and database auditing.  Either way, once you have identified the location of any sensitive data you can use the Delphix Masking profiling tool to locate further occurrences in the same or different databases and files.

 

Delphix Cloud Economics for on Premise

onpremisevcloud

Image courtesy of vyopta.com

Cloud Economics

Many customers are examining how they can leverage public clouds for reduced costs and increased agility.

Public Clouds save costs through automation, higher server utilisation, lower storage costs and consumption based charging.

Public clouds are seen by many as a great way to secure expanded capacity to handle demand spikes for seasonal or overflow activities. Allowing you to keep your on-premise IT resources trimmed to handle only Business as Usual (BAU) activity.

What about running BAU on Cloud? What is the cost per database on Cloud versus on premise, factoring in database server, network, storage and license costs?   Some customers have done this calculation and the answer may surprise you.

Delphix Economics

Let’s examine how Delphix enables the same core cost saving measures as Cloud solutions.

1. Automation

Delphix automates the entire end-to-end process of data synchronisation to provisioning. These, policy driven, automated workflows can also include Data Masking and Replication.   End users can manage their own full copy data sets but are limited to pre-allocated server resources to avoid consumption creep.

2. Server Utilisation

The two primary methods to increase server utilisation are density (increased concurrent instances) and hoteling (drop in environments).

Delphix enables increased density by allowing you to run more database instances on the same database server. Customers have reported 30% increases in concurrent instances per server.   Often a server is maxed out on memory but Delphix allows you to reduce the memory allocation per instance without compromising performance due to the caching effect of the Delphix shared cache.

Delphix enables hoteling by allowing an environment to be provisioned and de-provisioned quickly and automatically. This allows customers to pool existing siloed and implicitly underutilised database servers and share these servers between teams and across time zones. Test servers can be kept busier by supporting tests from different projects and sprints, simply by lifting the constraint on data provisioning.

Take a look at your current test server utilisation, imagine if you could increase that by 30% or better.

3. Storage

The best way to save storage costs is to avoid using storage altogether.   Delphix reduces non-production storage consumption by at least 70%.

Here’s the math:

Source copy = 50%
Non-production copies = 4

Storage consumption for 4 copies = 50% / 400% = 1/8 = 12.5%
Storage saving = 100% - 12.5% = 87.5%.

As with most of these calculations it depends.  In reality the savings are generally so substantial that you can use the highest performing storage available because you don’t have to buy much of it.  You then get the additional benefit of increased test throughput allowing more tests for increased quality and reduced release cycles.

The Nirvana

The best solution for most customers will be an on premise private cloud for BAU activity linked to a public cloud, or two, for burst activity.

Delphix will make your private cloud as economical as a public cloud with the added advantage of supporting true agile development for database based applications, i.e. the really important ones.

For public cloud, Delphix supports AWS and Azure.  Delphix will seamlessly replicate data between on-premise and cloud, cloud to cloud and cloud to on-premise.  Allowing you to switch workloads between clouds without worrying about moving data.

 

Delphix Secure Data Subset Replication

Screen Shot 2017-05-03 at 12.04.47

VLDB Data Subsets

Many Delphix customers are virtualising enormous, Very Large Databases (VLDBs) with some tables running into 10s of billions of rows.   Occasionally these databases grow because there is no archiving and purging process in place, but increasingly the current active working set of data is growing due to increasing transaction levels and consolidation of databases of record.   The transactional databases of today are now substantially larger than the data warehouses of yesterday.

In general with Delphix there is no requirement to provision data subsets since Delphix virtual databases can be provisioned in a few minutes regardless of size and with minimal storage overhead. However for some customers there is a requirement to provision only the latest few days, weeks or months of data for faster functional testing.

Delphix Selective Data Replication

Delphix Replication is used to provide resilience, scalability, data migration and data segregation.   With Delphix Replication you can choose to replicate any set of virtual databases and this is used to ensure only secure, masked, data is replicated to a second ‘replica’ Delphix Engine.

This same capability allows us to replicate a virtual database which has been trimmed to reduce it to a more manageable size, a subset if you will. For some databases like Oracle dropping tables and deleting data does not release space so you usually have to drop tablespaces and corresponding datafiles which is of course much easier if you have partitioned or transportable tablespaces.

What’s the Recipe?

The steps are relatively straightforward.

  1. Create a VDB on the primary Delphix Engine using a configure clone hook to run a script to trim the VDB
  2. Configure a masking job to execute after the trim process.
  3. Bounce the VDB once provisioned
  4. Create a Replication job with Selective Data Distribution enabled.
  5. Choose the trimmed and masked VDB to be replicated.
  6. Run the replication job.

Additional advantages of this approach are that you require even less storage assigned to the replication Delphix Engine and you only have to mask the trimmed database which will make the masking process execute proportionately faster to the reduction in size ratio.

To Summarize

If you have substantial databases you can create full virtual copies of these in production for production support, reporting and ETL.  You can replicate these full virtual copies to a replica engine for full volume functionality and performance testing.

With Delphix Selective Data Distribution Replication you also have the option to replicate secure data subsets for faster functionality testing.

Since a Delphix Replication Engine can be located on-premise or on-cloud you have the choice of what to replicate, how much to replicate and where to replicate.