Delphix Data Transport Shuttle

 Move, Copy or Replicate Data – sounds easy?

In IT, it often transpires that tasks that appear difficult turn out to be quite easy, clearly though not often enough.  Conversely tasks that appear to be easy, all too often, turn out to be rather difficult.   Moving, copying or replicating data between data centres, including Public Clouds turns out to be more challenging than it first appears.

Of course, there are many point solutions.  It is easy to transfer or copy files from server to server, databases have physical and logical replication, you can use ETL to logically synchronise data between two data repositories and storage replication is commonly used between data centres which have the same storage and High Availability primary use case.  You can also physically transport a backup copy of your data and restore it at the target end.

The problem is that few of these solutions are fully automated, they require custom configuration for each source, for storage replication you need to have the same storage at each end and physical transportation is a manual process which is only suitable for discrete copies and not for continuous replication or synchronisation.

Delphix’ best kept Secret:  Delphix Replication

Delphix Replication allows you to easily transport and synchronise any data between two Delphix Engines regardless of where the Delphix Engines are deployed.   This could be on-premise to on-premise, on-premise to cloud (Azure, AWS, IBM Cloud), cloud to cloud or cloud to on-premise.

Delphix Replication is also bi-directional.

Did you say any data?

Yes.  Delphix supports a variety of Unix , Linux and Windows platforms.  Another of our open secrets is you can push any data, documents, files, folder, backup images, database files, almost anything running on any of Delphix supported platforms directly into Delphix and that data can then leverage Delphix Data Plaform capabilities implicitly.  The Delphix data copy (dSource) is compressed, de-duplicated and versioned regardless of how the data is ingested.  Copies can then be provisioned from any point in time, included in a Data Pod with full self-service or the data can be replicated to another Delphix Engine.

Does it have to Virtualised?

No.  Once a Delphix Data Copy (dSource) is created, any point in time copy can be created as either a virtual copy or a physical copy.

A single fully automated data transport solution

Delphix Replication can act as a data shuttle.   Delphix can automatically ingest the data or you can push data into Delphix on your own schedule.  Once Delphix has a copy it can automatically transport any selected data between data centres or clouds and enable self-service provisioning of physical or virtual copies locally or remotely.

You can put all the data you need to transport between data centres or clouds into Delphix.

Does it require high bandwidth?

No, you can use what our Support team refer to as a “wet piece of string”.   For example we have a customer using Delphix Replication between Hong Kong and China over a 50 Mbps link and another replicating between Argentina and the over a 30 Mbps link.  It’s a matter of physics the higher the bandwidth and lower the latency the better the throughput.

The data is already compressed and is encrypted when transported using the Delphix Session Protocol (DSP).

It’s not an option

What I mean is, it is not an extra cost option, Delphix Replication is free with any two or more Delphix Engines.  It is available with a single Delphix Engine and does work but is not nearly as useful in such a deployment.

To Summarize

Replicating data between data centres and clouds is currently plagued by a variety of point solutions.  It is a non-trivial task and is increasingly required as customers start spreading workload between data centres and clouds, not just for high availability but for scaling workload, enriching data by tapping into remote services and improved vendor support.

Delphix simplifies data transportation and as a Data Platform makes the entire process consistent, simple, safe and easy to manage with out of the box services including compression, de-duplication, replication, encryption and virtualisation.

If you have data on any of Delphix’ supported Operating Systems, Linux (RHEL, CentOS, OEL, SUSE), Solaris, HP-UX, IBM AIX or Windows, that you want to easily send or synchronise to another data centre or cloud, try Delphix Replication.

 

 

 

 

Advertisements

Delphix Data Virtualisation enabled Data Masking

GDPR

How Do You Solve A Problem Like… Masking?

Well, as in the film, for you youngsters “The Sound of Music”, we could burst into song but most customers are more inclined to burst into tears.

Providing secure data for non-production use is an expensive exercise whether you synthesize or mask (obfuscate) data.  Encryption is much easier but not really an option for non-production use since it is reversible.

In addition, GDPR has raised the priority of securing non-production data with more binding legal requirements and stiffer penalties for non-compliance.

Delphix can help with self-funded, simplified masking and the ability to easily leverage lower cost cloud resources using masked data.  In this article, I outline how Delphix Data Virtualisation not only streamlines data masking but can also help fund it as well.

Challenge 1: I need a database copy to mask

The first challenge for a data masking exercise is you need a copy of the data to mask.  If your database is multi-terabyte where are you going to place that copy?  It has to be in the production zone because until it is masked it contains sensitive data.  If you have space for a full copy but you need to refresh that copy you will likely need to overwrite the previous copy and deny access to existing users for the duration of the copy and mask process.

Solution 1: Virtual Data Provisioning and Versioning

Delphix Data Virtualisation allows you to create a full read write copy of an entire database with minimal storage, almost zero overhead for the initial copy.  Better still Delphix Data Virtualisation allows you to create a new version without impacting previous versions allowing continuous production of new masked versions and uninterrupted access to previous versions.

Challenge 2:  I need to refresh several environments with the latest copy

With physical copies you need to copy the entire newly masked database to each environment and as with the original provisioning you will likely need to overwrite the previous copy and deny access to existing uses for the duration of the copy process.

 

Masking Pipeline 2

Secure Data Generation Pipeline and Consumption

Solution 2: Virtual Data Child Copies, Refresh, Branching and Versioning

Once you have masked your virtual data copy you can then create as many virtual child copies as required, all now sharing the set of blocks that were modified as a result of the data masking process, typically between 10-15% of data blocks.

Delphix Data Virtualisation allows you to refresh an environment on demand.  Even after refreshing your database with the latest masked copy, the previous versions are still accessible and you can switch between versions on demand.

Challenge 3:  I need to provision several masked databases at a consistent point in time

The first challenge is you need a masking solution that masks consistently across databases even from different vendors.  You also require to be able to provision the initial set of databases for masking to a consistent point in time.

Solution 3: Virtual Data Point in Time Provisioning and Data Pods

Delphix Data Virtualisation allows you to provision multiple databases from the same or different vendors to the same point in time.  These virtual copies can be grouped into Data Pods which are assigned to projects or individuals.  All self-service operations on the Data Pods including, start, stop, rewind, refresh, bookmark, share and switch version automatically apply to all the components in the Data Pod.

asset_73x

 

Challenge 4:  I need to get my masked data onto a Cloud

One of the commercial benefits of data masking is that it enables customers to leverage lower cost cloud resources using secured data.   Unfortunately, it is not always that easy to upload data to the cloud and keep the cloud copy updated with refreshed versions.  Often this requires setting up some form of replication or backup and restore process for each database.

Solution 4: Delphix Replication

Delphix Replication allows you to replicate only masked database copies between Delphix Engines, which can either be on premise or on cloud.   The replication process is completely automated and new database versions only require compressed blocks to be sent to keep the target engine in sync.

Challenge 5:  Cost of non-production copies and data masking

Data Masking adds additional license, processing and operating cost.

Solution 5: Delphix Data Virtualisation

Delphix Data Virtualisation will typically reduce your non-production storage footprint by 70-90%.   In addition, Delphix turnkey, end to end automation reduces operational costs and accelerates development and test cycles.   This combination of savings will typically fund your Delphix Virtualisation and Data Masking investments in 6-9 months.

To Summarize

Data Masking is a great example of how Delphix Data Virtualisation can streamline and accelerate the flow of data for use in non-production whether on premise or on cloud.   In this article, we have not even looked at the benefits of Delphix Data Masking

Any customer performing Data Masking using any Data Masking solution will benefit from Delphix Data Virtualisation.

 

 

 

Delphix enabled Logical Data Warehouse

ManyDatabase

Image courtesy of Business Cloud News

What is a Logical Data Warehouse?

A Logical Data Warehouse uses federated queries to query source databases, and other data sources, directly, avoiding the need to copy data into a dedicated data warehouse. The key benefits to this approach over traditional data warehouse solutions are:

  1. Access the latest version of the data, not a copy from the night before or last weekend
  2. Easily add new Data Sources and adjust data models on existing data sources much more quickly
  3. Reduced Infrastructure, decreases the amount of server capacity and storage required because you leverage the source environments.

A Logical Data Warehouse is often called a Virtual Data Warehouse or a Federated Data Warehouse.

To Copy or not to Copy, that is the question

As mentioned there are two main approaches to provide access to data for analytics purposes.  Either you copy the data to an analytics or data warehouse platform or you keep the data in place and implement a unified data model and use a set of connectors to query the data directly from their sources.

There is a variation of the latter, mainly OLAP solutions, where pre-calculated query results are materialized into views or multi-dimensional cubes for improving response times for predicted queries.

Solutions that require their own copy of the data, include Exadata, Teradata, Netezza, Hadoop, Snowflake, Amazon Redshift, Azure SQL Data Warehouse and Google BigQuery.

Logical (or Virtual) Data Warehouse vendors include Denodo, Cisco / Composite, Red Hat JBoss Data Virtualisation, DataVirtuality, and SAP Hana.

Similar to Delphix which avoids all the costs and delays in provisioning copies of data, a Logical Data Warehouse (LDW) avoids all the costs and delays in copying data but still needs continuous access to the source data.  With Delphix the opportunity exists to build a Logical Data Warehouse on top of Virtual Databases.  Why would you want to do that?

 

Virtual Data Sources for Logical Data Warehouses

Using a logical data warehouse customers may be concerned about performance hits on mission-critical source systems being exposed directly to logical data warehouse queries.

Connecting your logical data warehouse to Delphix Virtual Databases has the following advantages:

  1. Protect Production, Delphix offloads production by provisioning virtual data copies on separate database servers using shared storage managed by Delphix.
  2. Enhance Performance, run as many virtual databases as required to spread the load with no storage overhead.
  3. Enhance Security, Delphix can provision original or secured (consistently masked) virtual database copies avoiding the need and cost of masking an entire warehouse data copy.
  4. Time Travel, Delphix can provision consistent database sets from any point in time within the configurable retention period, enabling much simpler period to period based analytics over much smaller data sets.
  5. Reduced Infrastructure, leverage under-utilized but licensed, non-production servers.

You probably already have sufficient licensed database servers to accommodate a data warehouse solution without building or renting additional infrastructure or purchasing additional licenses.

  1. Ground or Cloud, Delphix can run in your data center or on AWS or Azure. Delphix replication can allow you to synchronize your data in the cloud allowing you to run your Logical Data Warehouse against co-located database copies.

 

To Summarize

Delphix is an outstanding complement to Logical Data Warehouse solutions, in particular for provisioning high performance, secure, virtual copies of your most important data.

 

 

Data Masking: Locating Sensitive Data

Data Masking is a Sensitive Subject

One of the primary concerns with Data Masking is identifying the locations of sensitive data within a database. For custom applications, the development team will likely know where all sensitive data is stored, however with third party applications this is a bigger challenge, especially considering some database schemas can have 10s of thousands of tables.

Delphix Masking has a Profiling tool which uses regular expressions to locate potentially sensitive data based on the column name or a sample of the data itself. This is easily extended to add patterns for any custom formatted sensitive data and the profiling tool can then be used to find any additional occurrences of that data in any supported database or file type.

The profiling tool cannot always locate all sensitive data so an additional technique may help.

Customers are always right and sometimes Innovative

Discussing this issue recently with David Battiston from Worldpay, I suggested we could create a Delphix VDB database copy, and run functional tests covering sensitive transactions such as customer account creation, modification, ordering, or for a web based application any screen using https. We would use sentinel data in the test so we could easily spot it in the database. I proposed using a text index on the database but David suggested using database auditing.

What a great idea, I decided to give it a try.

Test Process

I downloaded the Community Edition of an eCommerce application called Konakart.
I created an Oracle Virtual Database as the database to underpin Konakart and created database user KKART to own the Konakart schema.

-- Create database user KKART to own Konakart schema

CREATE USER kkart IDENTIFIED BY kkart
    DEFAULT TABLESPACE users
  TEMPORARY TABLESPACE temp
    QUOTA UNLIMITED ON users;

GRANT connect,resource TO kkart;

 

I then installed the Konakart application which populated the VDB with the Konakart schema and some demo data. The Konakart schema has around 128 tables.

I then enabled database auditing in the Oracle VDB.

-- Enable extended database auditing

ALTER SYSTEM SET audit_trail=db_extended SCOPE=SPFILE;
shutdown
startup

-- Enable audit for database schema owner KKART
AUDIT ALL BY kkart BY ACCESS;
AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE,
      DELETE TABLE BY kkart BY ACCESS;
AUDIT EXECUTE PROCEDURE BY kkart BY ACCESS;

 

Insert Sentinel Data

Using the Konakart application I added new customer details and placed an order.

Below is a summary of the details.

Screen Shot 2017-08-09 at 15.26.54

Looks like Donald has just purchased a printer!

Locate Sentinel Data

Now we just have to query the audit tables to locate where these details have been recorded.

select SQL_TEXT, SQL_BIND
  from dba_common_audit_trail
 where OBJECT_SCHEMA = 'KKART'
   and TRANSACTIONID is not null /* failed */
   and SQL_BIND like '%123 Washington Road%'; /* sentinel data */

And the result was two tables, ADDRESS_BOOK and ORDERS and quite a few columns.

INSERT INTO address_book
(address_book_id, entry_email_address, entry_company, entry_city,  entry_lastname, entry_firstname, entry_gender, entry_telephone,
 entry_telephone_1, entry_suburb, entry_zone_id, entry_country_id, entry_street_address_1, entry_postcode, entry_street_address,customers_id)
 VALUES (:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 ,:16 )

#4 entry_city = Washington
#5 entry_lastname = Trump
#6 entry_firstname = Donald
#8 entry_telephone = 555-666-1234
#14 entry_postcode = 666
#15 entry_street_address = 123 Washington Road

 

INSERT INTO orders
(billing_telephone, customers_telephone,customers_locale,
 orders_status, lifecycle_id, delivery_country,
 filterdecimal1, points_reservation_id, orders_id,
 billing_name, customers_name, billing_street_address,
 billing_country, delivery_name, billing_city, date_purchased,
 billing_state, customers_city, points_used, currency,
 parent_id, delivery_addr_id, billing_postcode,
 delivery_postcode, payment_module_code,
 billing_address_format_id, payment_method, delivery_city,
 customers_country, customers_state, customers_id,
 customer_addr_id,points_awarded, customers_postcode,
 last_modified, customers_email_address, currency_value,
 delivery_telephone, customers_street_address, delivery_state,
 delivery_address_format_id, delivery_street_address,
 shipping_module_code, billing_addr_id, orders_number,
 customers_address_format_id) VALUES (:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,
  :7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 ,:16 ,:17 ,:18 ,:19 ,
  :20 ,:21 ,:22 ,:23 ,:24 ,:25 ,:26 ,:27 ,:28 ,:29 ,:30 ,:31 ,
  :32 ,:33 ,:34 ,:35 ,:36 ,:37 ,:38 ,:39 ,:40 ,:41 ,:42 ,:43 ,
  :44 ,:45 ,:46 )


#1 billing_telephone = 555-666-1234
#2 customers_telephone = 555-666-1234
#6 delivery_country = United States
#10 billing_name = Donald Trump
#11 customers_name = Donald Trump
#12 billing_street_address = 123 Washington Road
#13 billing_country = United States
#14 delivery_name = Donald Trump
#15 billing_city = Washington
#17 billing_state = Washington
#18 customers_city = Washington
#23 billing_postcode = 666
#24 delivery_postcode = 666
#28 delivery_city = Washington
#29 customers_country = United States
#30 customers_state = Washington
#34 customers_postcode = 666
#36 customers_email_address = djtrump@whitehouse.com
#38 delivery_telephone = 555-666-1234
#39 customers_street_address = 123 Washington Road
#40 delivery_state = Washington
#42 delivery_street_address = 123 Washington Road

  

So Konakart is unusual by using clear column and table names but that is not the case for many third-party applications. Using this technique, we can semi automate the detection of sensitive data regardless of the database schema complexity or naming conventions.

To Summarize

The first challenge with data masking is getting a hold of a full copy of the database to be masked. A Delphix Virtual Database gives you a personal pop up full copy in minutes.

The second challenge with data masking is which columns to mask. If you do not already know you could use a technique as above using sentinel data and database auditing.  Either way, once you have identified the location of any sensitive data you can use the Delphix Masking profiling tool to locate further occurrences in the same or different databases and files.

 

Delphix Cloud Economics for on Premise

onpremisevcloud

Image courtesy of vyopta.com

Cloud Economics

Many customers are examining how they can leverage public clouds for reduced costs and increased agility.

Public Clouds save costs through automation, higher server utilisation, lower storage costs and consumption based charging.

Public clouds are seen by many as a great way to secure expanded capacity to handle demand spikes for seasonal or overflow activities. Allowing you to keep your on-premise IT resources trimmed to handle only Business as Usual (BAU) activity.

What about running BAU on Cloud? What is the cost per database on Cloud versus on premise, factoring in database server, network, storage and license costs?   Some customers have done this calculation and the answer may surprise you.

Delphix Economics

Let’s examine how Delphix enables the same core cost saving measures as Cloud solutions.

1. Automation

Delphix automates the entire end-to-end process of data synchronisation to provisioning. These, policy driven, automated workflows can also include Data Masking and Replication.   End users can manage their own full copy data sets but are limited to pre-allocated server resources to avoid consumption creep.

2. Server Utilisation

The two primary methods to increase server utilisation are density (increased concurrent instances) and hoteling (drop in environments).

Delphix enables increased density by allowing you to run more database instances on the same database server. Customers have reported 30% increases in concurrent instances per server.   Often a server is maxed out on memory but Delphix allows you to reduce the memory allocation per instance without compromising performance due to the caching effect of the Delphix shared cache.

Delphix enables hoteling by allowing an environment to be provisioned and de-provisioned quickly and automatically. This allows customers to pool existing siloed and implicitly underutilised database servers and share these servers between teams and across time zones. Test servers can be kept busier by supporting tests from different projects and sprints, simply by lifting the constraint on data provisioning.

Take a look at your current test server utilisation, imagine if you could increase that by 30% or better.

3. Storage

The best way to save storage costs is to avoid using storage altogether.   Delphix reduces non-production storage consumption by at least 70%.

Here’s the math:

Source copy = 50%
Non-production copies = 4

Storage consumption for 4 copies = 50% / 400% = 1/8 = 12.5%
Storage saving = 100% - 12.5% = 87.5%.

As with most of these calculations it depends.  In reality the savings are generally so substantial that you can use the highest performing storage available because you don’t have to buy much of it.  You then get the additional benefit of increased test throughput allowing more tests for increased quality and reduced release cycles.

The Nirvana

The best solution for most customers will be an on premise private cloud for BAU activity linked to a public cloud, or two, for burst activity.

Delphix will make your private cloud as economical as a public cloud with the added advantage of supporting true agile development for database based applications, i.e. the really important ones.

For public cloud, Delphix supports AWS and Azure.  Delphix will seamlessly replicate data between on-premise and cloud, cloud to cloud and cloud to on-premise.  Allowing you to switch workloads between clouds without worrying about moving data.

 

Delphix Secure Data Subset Replication

Screen Shot 2017-05-03 at 12.04.47

VLDB Data Subsets

Many Delphix customers are virtualising enormous, Very Large Databases (VLDBs) with some tables running into 10s of billions of rows.   Occasionally these databases grow because there is no archiving and purging process in place, but increasingly the current active working set of data is growing due to increasing transaction levels and consolidation of databases of record.   The transactional databases of today are now substantially larger than the data warehouses of yesterday.

In general with Delphix there is no requirement to provision data subsets since Delphix virtual databases can be provisioned in a few minutes regardless of size and with minimal storage overhead. However for some customers there is a requirement to provision only the latest few days, weeks or months of data for faster functional testing.

Delphix Selective Data Replication

Delphix Replication is used to provide resilience, scalability, data migration and data segregation.   With Delphix Replication you can choose to replicate any set of virtual databases and this is used to ensure only secure, masked, data is replicated to a second ‘replica’ Delphix Engine.

This same capability allows us to replicate a virtual database which has been trimmed to reduce it to a more manageable size, a subset if you will. For some databases like Oracle dropping tables and deleting data does not release space so you usually have to drop tablespaces and corresponding datafiles which is of course much easier if you have partitioned or transportable tablespaces.

What’s the Recipe?

The steps are relatively straightforward.

  1. Create a VDB on the primary Delphix Engine using a configure clone hook to run a script to trim the VDB
  2. Configure a masking job to execute after the trim process.
  3. Bounce the VDB once provisioned
  4. Create a Replication job with Selective Data Distribution enabled.
  5. Choose the trimmed and masked VDB to be replicated.
  6. Run the replication job.

Additional advantages of this approach are that you require even less storage assigned to the replication Delphix Engine and you only have to mask the trimmed database which will make the masking process execute proportionately faster to the reduction in size ratio.

To Summarize

If you have substantial databases you can create full virtual copies of these in production for production support, reporting and ETL.  You can replicate these full virtual copies to a replica engine for full volume functionality and performance testing.

With Delphix Selective Data Distribution Replication you also have the option to replicate secure data subsets for faster functionality testing.

Since a Delphix Replication Engine can be located on-premise or on-cloud you have the choice of what to replicate, how much to replicate and where to replicate.

 

Delphix and Data Subsets

TPCE

Image courtesy of tpc.com

Who needs Data Subsets?

With Delphix there is generally no need for data subsets.  If the reason for creating data subsets is to save storage and improve provisioning times then Delphix Virtual Databases (VDBs) avoid that.  Occasionally customers want to use data subsets to improve performance of tests involving table scans and hence want to use smaller tables.  The most common examples are batch jobs or reports.   For these tests one option could be to consider using more restrictive WHERE clauses or partitioned tables.  These changes can be hidden from the application by using database views.

However if you still want to proceed with data subsets Delphix can help with that too.

Agile Data Subsets

Almost all data subsetting involves creating a standalone relationally intact set of data which means you need to understand the relationships between all the tables needed to support your tests.   This for many is a huge stumbling block and is why the cost of creating subsets is high and hence the frequency of generation is low.

The good new is that it is much easier to create a data subset in a VDB since you do not need to subset all the tables as all the tables are present and none of them incur a storage overhead.   You only need to subset the larger tables, usually the transaction and history tables that are involved in full table or slow scans.

For partitioned tables you can just drop the unwanted partitions.  For non-partitioned tables you can either delete rows, or more efficiently create a replacement table with the desired rows.

The only occasion when you need to factor in table relationships is if you are subsetting two tables where one has a dependency on the other.    That is easiest done using foreign keys, and if they do not already exist you can create them for the purposes of subsetting. Since this is a VDB, a personal full read write copy, you can manipulate it however you require.   You do not have to factor in any other relationships.

Secure Data Subsets

If you started your data subsetting process using a pre-masked VDB then your data subset is already secured.  If not then you can run the same Delphix Masking job against you data subset VDB.

To cap it all

Having streamlined your data subsetting process you can now create as many copies on demand as required using your data subset VDB as a parent VDB from which child VDBs can be created.

Once you have done all your tests against data subsets you can then run a full volume test against a copy of the original VDB.