Modern Data Management Blog Series: Part Two — Discover

INTRODUCTION

In part one of this four-part blog series on modern data management, we discussed the activities and deliverables involved in refining data and creating a successful data management system and project plan. Part two leverages the progress made in the Strategy phase to guide the second phase of the process — Discover.

Phase Two: Discover

As the name implies, the Discovery phase is focused on several activities that will identify all current data-related elements and map the flow of data throughout its lifecycle. However, sifting through disparate data repositories and making sense of the processes (and personnel) involved in current data lifecycles can seem like trying to untangle a disorganized web of knots.

While these exercises may seem daunting, identifying what elements exist, as well as those that need to be developed, will pay dividends towards supporting the organization’s data vision and accompanying business goals. Now, let’s dive into the specific activities in the Discover phase.

Activities 

As you work to establish the current state of data, you’ll discover what elements already exist and what elements need to be developed throughout your project timeline.

You’ll also begin to see the gaps between the current state of your data and the data vision statement you created in Phase one  (Strategy). To begin this exercise, you’ll need to create a current state of architecture diagram. Follow the steps below to begin the process.

Steps to develop your current state of architecture diagram

  • Identify the current applications and their dependencies
  • Identify the data/file stores and databases
  • Connect the end-to-end data flow
  • Discover on-premise or cloud infrastructure
  • Identify manual/automated processes
  • Identify ETL (Extract, Transform, Load) Jobs
  • Third party integrations and dependencies

Below is an example client’s current state of architecture diagram. In this example, the client has numerous manual processes and ETL jobs preventing them from scaling and delivering data to their customers in a timely manner. 

Current Architecture with ETL and Manual Processes

Identify Key Data Elements

Once the current data architecture has been mapped out, you’ll need to understand and identify different types of data and the data’s core elements. Doing so helps you analyze the various relationships and dependencies within your data. You should also be able to map data elements to business processes.

Data Types and Key Elements to identify: 

  • Master Data represents people, places, products or entities that describe an organization. It’s important to differentiate master data from transaction data, which changes frequently and is highly volatile. Take an invoice for example. The invoice number and dollar amount is transaction data. Master data would be data describing the transaction, such as the product and the customer.
  • Reference Data provides context to transaction data, but it covers a range of valid values. It is also fundamental in evaluating data quality in operational processes. Common examples would be the 50 United State codes (MO, KS, CO, etc.), product categories and zip codes.
  • Meta data provides information to data. With metadata in place, organizations know what data they have, where the data resides, how that data is defined and how the data is produced. Metadata provides the foundation for data governance tools, such as Data Catalog, Data Lineage and Data Quality. Having a data governance program is also of utmost importance. 

Assess Data Quality

Once data types, elements and key relationships/dependencies have been identified, the next step is ensuring the data can be transformed into high quality assets. This is done by measuring the data quality.

There are six quantifiable elements of data quality:

  • Completeness: measure the degree to which necessary data is available for use 
  • Uniqueness: measure the degree to which data is unique and cannot be mistaken for other entities
  • Consistency: measure the degree to which the data is equal within and between the dataset
  • Validity: measure the degree to which the data is within defined requirements like format, size, type and range
  • Accuracy: measure the degree to which the data represents reality
  • Timeliness: measure the degree to which the data is available at the time it is needed

Data quality management and assessment solutions are available to automatically measure and score these elements based on user-defined “rules.” An example of metric results is below.

Example Data Quality Metrics

Design Target Data Architecture

You’ve gathered data and identified the gap to achieving your vision statement, so now you can begin to lay out the “future state” data models and architecture. At this point, it is imperative to revisit the business’ data-related goals and vision statement identified in the Strategy phase. This  ensures the data models and architecture will support the desired business goals..

Continuing with the previous client example, below is a target, cloud-based data architecture. Here, the data sources are integrated with cloud-based serverless services and automated workflows. The combination of those two aspects meet the client’s goals and data vision, which were to have the ability to scale in a cost-effective manner and to automate processes.

Target Architecture with Automated Processes

Design Target Cloud Architecture

Next, you’ll need to design a target cloud architecture in consideration of the data architecture and data privacy policies. It’s important to document the required resources, data security elements and the CI-CD process.

Tools

Data change management tools can provide features such as

  • Data cataloging
  • Data quality metrics
  • Data lineage
  • Data privacy checks, and
  • Dashboards

Popular tools include: Alation, Ataccama, Informatica, Talend, etc.

Project Roadmap and Cost Estimates

Based on the target data and cloud architectures, create the project roadmap with cost estimates and a project timeline.  

Deliverables

From the activities above, the Discover phase should provide the following deliverables:

  • Current state of architecture diagram
  • Data quality metrics
  • Target state of data architecture diagram
  • Target state of cloud architecture diagram
  • Target state of cloud resource CI/CD diagram
  • Project Roadmap
  • Cost estimates 

Conclusion

Now that the Strategy and Discover phases are complete, you can see how the data journey and process builds upon itself- the business goals and supporting data vision identified during the Strategy phase provided guidance for Discover phase, which establishes a blueprint for the third step in the process, Implement

Stay tuned for the third installment of the Modern Data Management blog series for insight into the activities and deliverables, as well as unique considerations and recommendations of the Implement phase of your data journey.

Contact us today

Contact us today

Sun Jang is a Principal Architect at 27Global. Founded in 2008, 27Global designs, builds, and operates technology solutions for businesses of all sizes. The perfect pairing of a local leadership with offshore pricing, 27Global has the business acumen to understand your vision and the expertise to build your technology solution. To learn more, visit 27global.com or connect with us on LinkedIn and Twitter.

Share this post

Facebook
Twitter
LinkedIn