Data governance involves a comprehensive set of rules that define how an organisation manages its data. These rules are often spread across multiple policies such as privacy policies, data retention policies, or data classification policies. The cornerstone of any successful data governance initiative is a thorough understanding of the data an organisation possesses, how it flows through enterprise systems, and the purposes for which it is used.
By Nader Henein, VP Analyst at Gartner
Data governance rules stem from two primary sources: regulatory requirements and self-imposed organisational standards. Regulatory rules, driven by legal obligations, often vary across jurisdictions and carry significant penalties for non-compliance. On the other hand, self-imposed rules are based on best practices and business needs. The initial step in applying these rules is to gain a clear understanding of where data is stored and how it is utilized, achieved through a systematic data discovery process. This process involves both unstructured and structured data discovery. While many organisations start this journey manually, automation becomes crucial for scalability. Before embarking on automation, security and risk management (SRM) leaders must consider three key best practices for effective automated data discovery. They must assess the automated discovery platform’s capacity to read, interpret, and act upon their data.
Evaluating Platform Connectivity and Data Reading Capabilities
When selecting a data discovery platform, SRM leaders must ensure it can effectively connect to and read data from diverse sources. Start by compiling a comprehensive list of your organisation’s data repositories before engaging with technology providers. The ideal platform should feature a robust library of upstream connectors, capable of ingesting and analysing data from 80% to 90% of your data stores, depending on how many specialist or legacy systems the organisation maintains.
Additionally, assess the cost and feasibility of developing custom connectors for data stores not covered by the platform. It is crucial to determine whether these connectors can be developed internally using available APIs or if they require external provider support.
For unstructured data, confirm that the platform can read all file types used within your organisation. Some common file types include PDFs, which are widely supported. However, some organisations may have specialized formats like CAD design documents or Digital Imaging and Communications in Medicine (DICOM) images used by healthcare providers. Verify whether your technology partner supports these file types or if custom file interpreters are necessary.
In terms of structured data integration, evaluate the platform’s ability to connect to structured data repositories via Java Database Connectivity/Open Database Connectivity connectors or application-specific APIs.
By thoroughly considering these factors, SRM leaders can select a discovery platform that effectively covers the enterprise landscape, balancing out-of-the-box capabilities with the need for custom development.
Evaluating the Platform’s Learning and Recognition Capabilities for User-Defined Data Attributes
When assessing a data discovery platform, SRM leaders must consider its ability to learn and recognize user-defined data attributes. While solutions often come with preprogrammed tags or labels such as “Personal,” “Sensitive,” or “HR,” it is essential to configure these to align with your organisation’s specific data needs. The technology used to ingest data and extract appropriate tags based on data attributes may be pattern-driven through regular expressions, AI-driven through machine learning, natural language processing, and computer vision, or a combination of both.
It is unrealistic to expect that all necessary tags will be preprogrammed into the platform. Therefore, a critical evaluation point is the platform’s ability to “learn” and recognize new data attributes through training or programming. Conduct a simple trial to assess the platform’s proficiency in identifying new data attributes defined by your organisation and applying the appropriate tags to relevant data or files. This process typically involves collaboration between the vendor’s implementation team and your internal team, who will later manage the platform. The trial could range from tagging PDF files with an “invoice” label to more complex tasks like extracting order numbers from scanned invoices and applying custom tags.
Ensuring Your Platform Can Orchestrate Data Governance Activities
Once your data repositories are scanned and tagged, the next step is to leverage this information effectively. The objective is not merely to understand the data but to operationalize the discovered tags to automate data governance activities. For instance, these tags can be used to automate data classification based on various tag combinations or to trigger rules in your data retention schedules. An example of this is a “CV” document governed by the General Data Protection Regulation (GDPR) being automatically deleted when the “last modified” date exceeds a predefined limit.
To achieve this, ensure that the platform you select can orchestrate your planned downstream tasks based on your data’s characteristics, such as automating data retention or classification. This orchestration can be accomplished either natively within the discovery solution or through downstream connectors to third-party platforms, like archival solutions within your enterprise architecture. Some specialized discovery platforms even offer enterprise clients the ability to develop their own downstream connectors using documented APIs. By verifying these capabilities, SRM leaders can ensure seamless integration and execution of data governance activities across the organisation.