Data Architectures and Technology Independence

Is There Gray When It Comes To Technology Independence for Ontology Models, Semantic Models, and Logical Data Models?

When it comes to ontologies, semantic models (conceptual data models), and logical data models, as well as the conceptual and logical architectures they represent, achieving technology independence can be a little tricky. Often, there is a lot of gray when determining if these models or architectures are truly technology independent. It can also be a matter of great discussion - and disagreement. Sometimes the line between being technology independent and technology dependent is clear, and sometimes it is a little blurry, especially when it comes to the primarily technology constrained modeling notations and tools available on the market. Sometimes, however, the meaning of technology independence is simply misunderstood.

What Does "Independent of Technical and Implementation Considerations" Really Mean?

The statement that "conceptual and logical architectures or models are independent of technical and implementation considerations" is often misunderstood. It does not mean modeling at these levels is not "technical" or that advanced technology is

not involved. In fact, quite the opposite is true. Technical methods from multiple fields of study are used to effectively and correctly create and manage conceptual data architectures, including ontologies and semantic models, and logical data architectures.

An example of another field of study used in data modeling is linguistics, which is most definitely a technical field of study. Linguistics is at the core of modeling data at the conceptual and logical levels. For example, many linguistic rules for the English language (or "fill in the blank" language) related to semantics, syntax, and pragmatics are used to produce proper names and effective definitions of all data model components, and to correctly and effectively state business rules and their resulting facts and relationships. Another technical field of study used in data modeling is computer science. Advanced computer technology and tools are required to depict and house models representing conceptual and logical architectures, such as modeling tools, modeling formalisms, business rule management tools and techniques, metadata repositories, etc. Mathematics is another technical field of study used in data modeling. Mathematical processes are used to normalize data models and architectures. Therefore, building and managing conceptual and logical architectures, and the data models that represent them, are indeed technical endeavors involving multiple advanced technologies or fields of study.

What does the phrase "independent of technical and implementation considerations" really mean? It means conceptual and logical architectures, and the models that represent

them, should not be constrained by any type of technology nor should they be based on specific implementation decisions. It also means they should be independent of whether something is manual or automated, and independent any specific technical or automated solutions at the physical levels. In today's world, that usually translates to saying conceptual and logical architectures or models are defined and depicted independent of: (a) manual vs. automated decisions, (b) any type of database management system (DBMS), (c) any programming, computer, or scripting language, and (d) any specific software or tool vendor's systems, databases, or languages used to automate all or part of business systems. It is also means that specific technology used for modeling should also not constrain the architectures. However, that is where we run into a conundrum, even in today's heavily automated world, due to the tools and technology currently available to the general public for managing conceptual, and logical data architectures.

Semantic, Conceptual, and Logical Data Model Formalisms, Notations, and Tools Blur the Line

The line between being technology independent and technology dependent has never been more blurry than it is when it comes to the modeling formalism, notations, and automated tools available and typically used to manage and depict ontologies, semantic models, and logical data models. From the time they were invented, the majority of manual and automated data modeling formalisms and tools have described and depicted models using technology constrained modeling notations. (This is also true for function, process, use-case, and workflow modeling tools.) I know there are technology independent modeling notations that can be used for ontologies, semantic models, and logical data models. However, even at this point in our industry they are not used for data modeling at most companies. In my own experience, those few modeling formalisms or notations that are, or at least appear to be, technology independent, have not been accepted nor used at any of the companies where I have worked. And from my experience, most data architects are not aware of them - for a variety of reasons. There are, as usual, a couple of exceptions to the rule.

In less than a handful of instances over the course of my career, rouge architects quietly used technology independent data modeling notations and or tools to create ontologies or semantic data models. However, this created a problem of having data models built from the viewpoint of one or two architects and virtually unknown or endorsed by the owners and end users of that data. It also resulted in models either built by hand or by lesser known modeling tools that could not be easily integrated with the other models, let alone uploaded and stored in enterprise metadata repositories.

Most companies where I have worked would not endorse the use of these lesser known modeling notations and tools. Primarily, this was because they owned the more widely available modeling tools and metadata repositories that came with pre-built applications used to transfer data between the repository and the modeling tools.

Metamodels Are Sometimes Technology and Implementation Dependent (even though they should not be)

The line between technology independence and technology dependency is also blurry for metamodels - models of models or architectures. These models of architectures, or models of models, are called metamodels. Metamodels are very important - critical, in fact. However, conceptual and logical metamodels, like regular models, are typically defined and depicted in, you guessed it, technology dependent tools and notations.

One of the biggest issues associated with metamodels is the time that is required to build them, especially from scratch. Almost every organization managing enterprise architectures ends up spending months, or years, creating and solidifying ontologies and other models of the architectures they need to explicitly manage. Part of this problem due to the fact many organization still believe "we are different" or "we are unique in our line of work". Yet metamodels are pretty much identical for all organizations, in all fields of study, and for companies of all sizes.

In my opinion, organizations should not have to build their own metamodels - or think they are different from anyone else when it comes to metamodels for enterprise architectures. I would like to see metamodels that are available at a cost affordable by organizations of all sizes. Also, conceptual and logical metamodels, just like regular models from these perspectives should independent of technology and implementation considerations as well as independent of vendors' specific modeling tools and metadata repositories.

There are some vendors who deliver some metamodels with their products. However, the ones I have seen are extremely vendor dependent; they only include data that will be stored in their tool (implementation considerations); they are usually at a very high level, they rarely include detailed and effective definitions, business rules, etc.; and they are rarely normalized to 3NF, let alone 4NF. Also, vendors often change or denormalize the metamodels to match specific DBMS or implementation considerations - in other words, they are technology dependent.

Data Normalization Is About Normalizing Logic - Not Just Relational Models or Databases

While the line between being technology independent and being technology dependent is sometimes blurry, there are times when it is simply misunderstood. Let's take, for example, the process of data normalization. Data normalization is often incorrectly considered to be a process only applied to physical models or physical databases, specifically, relational models and databases.

The primary object of normalization is to normalizing logic, not physical models or databases. In fact, normalization is based on well-established mathematical processes, such as first-order logic (also called first-order predicate calculus) and set theory, that existed long before databases or computers were invented. There are just as many benefits, if not more, to applying normalization to conceptual data or semantic models and logical data models as there are to applying it to physical data models or databases. The most benefits are realized a conceptual/semantic or logical data model is normalized to at least 4NF.

Ironically, at most companies, almost all physical models and database are actually DE-normalized, typically for performance reasons. This is especially true for specialized databases such as data warehouse and data marts. Therefore, to say that normalization only applies to relational physical models or relational databases is being very shortsighted and ignorant (meaning lacking awareness or knowledge). For a further discussion on the topic of Data Normalization, please read the article titled "Data Normalization Is About Normalizing Logic".

Technology Independent Tools for Effectively and Easily Managing the Entire Enterprise Architecture Still Don't Exist

Explicit conceptual and logical architectures (including ontologies and semantic models), as well as the metamodels defining these architectures and models, have existed all to some level of degree. They have existed in companies of all sizes, in all types of industries and fields of study, and in countries all over the world. Yet, it is still difficult, if not impossible, to find technology independent automated modeling formalisms and tools.

The task becomes even more insurmountable when you look for a technology independent tool that models data, functions, processes, workflows, use cases, etc. that is ALSO seamlessly integrated with an effective and efficient metadata repository. It does become impossible when you add the requirement for the metadata repository to be scaled (large enough and fast enough) to effectively manage the entire enterprise architecture, especially for a large corporation.

If there are suites of tools that meet all of these requirements today, they are not widely known about nor are they available to the general public. Of course, some companies can afford to hire consultants or software vendors to build a "one-off" (and very expensive) solutions that meet all these requirements, but this does not solve the problem for small organizations or the industry as a whole.

There is also the decades long problem of having to play games with most, if not all, currently available modeling tools and/or metadata repositories for them to be integrated and able to effectively manage even portions of the enterprise architecture, let alone the entire enterprise architecture. By play games I mean build or heavily modify structures used to contain the metadata, change the meaning of objects and/or ICONS that come with the tools - especially modeling tools, create complicated "user defined" reports from various viewpoints, set up complicated and lengthy job runs, use SQL or SQL-like languages to create long and complicated queries, create user interfaces that are actually user friends, not the ones that are supposed to be user friendly that come with the tools, etc.

Another long standing issue that has not yet effectively been resolved, is fact that the sheer amount of metadata that exists at the enterprise level for organizations cannot be literally stored and integrated in one physical metadata repository. Even if it could be stored in one repository, it cannot easily be kept up-to-date. Also, should all metadata be stored? Some people feel that it's not necessary to store every single piece of metadata about every instance of physical data - especially for data that is "temporary".

These issues bring up the topic of virtual or federated repositories. Virtual and federated metadata repositories are terms and concepts or that have been thrown around for quite a while now. However, I'm don't think there is a common agreement within the industry as to what these terms actually mean nor how they should be implemented. Also, I haven't yet seen any virtual or federated tools in existence, except in development at some vendors, and I do not know of any available to the general public.

Please don't start writing me a bunch of email messages complaining about my last few statements! Yes, I know there are some decent modeling tools that exist today, and I realize some of them are integrated with one or more powerful metadata repositories. I've probably used most of those at some point during my career! And, I will guarantee that I have used a lot of modeling tools and several repositories that most current architects and modelers have never heard of because they either no longer exist, or because they have morphed into entirely different tools. Some tools that have been around for a while originally had different names and/or different owners than they do now. In addition to working as an employee for medium and large-sized corporations, I have also worked for software and metadata repository vendors, methodology companies, as well as consulting companies who created and sold modeling tools. So, I am fully aware that a some of the modeling tools and metadata repositories on the market right now are very capable in some respects and/or have improved greatly over the years. But none of them do everything that is needed of them - at least not yet.

Also, PLEASE do not send me messages saying the primary problem with most of the enterprise scaled modeling tools and metadata repositories is they and/or the programming languages used to build them are "proprietary". ALL modeling tools, databases, and languages, etc are "proprietary" - or at least what most people mean when they use the word proprietary in this context - including DB2, Teradata, SQL, mySQL, etc. Just because some of these products are more common and well-known than others doesn't mean they are any less "proprietary" than the less common or lesser-known products. I would even go so far as to say many portions of open source operating systems and CMS (content management systems) are also "proprietary" even if anyone and his brother can obtain the code and modify it - or mess it up.

So, please don't send me a bunch of messages telling me I'm wrong or that I just don't understand - at least not about these particular topics. (The exception to this rule is I would like to know when tools meeting all of the requirements are available to the general public.)

Still Looking for Data Modeling Tools That Manage and Map all Levels of Data Models – Automatically

I already mentioned the issue of the lack of enterprise scaled, technology independent, easy to use, modeling tools integrated with enterprise metadata repositories. More specifically related to data modeling, is the difficulty of finding a data modeling tool that can effectively, easily, and cost effectively manage all levels of data models as well as automate most of the maps between the levels.

Again, I realize there are tools that can perform these processes to some extent. But most of the time these tools that produce actual physical models and database models from logical models are limited to the data that will be automated. Also, the generated physical models have to be heavily modified or changed completely in order to be literally implemented in databases, especially specialized databases such as data warehouses and data marts.

One organization where I worked took a slightly different approach to solving the issues related to the transition from an enterprise logical model to individual physical models or database models. They started with the enterprise semantic model and logical data model but they created a whole new "logical" model they referred to as their "enterprise logical model". It was stripped of any data that would not be stored in a database, they added code tables, and made other changes related to their technology and implementation choices. This modified "logical" model was then used to generate physical models for databases and warehouses, as well some DDL for certain types of programs.

I still haven't decided if the method used by that organization to transition from logical to physical is more cost effective and easier to manage than starting from a technology independent logical model and creating technology dependent physical models for each instance of a database, program, warehouse, etc.

Part of the issue lies in difference in technology needed for each type of DBMS. The other problem is there is not always a one-to-one mapping between logical attributes and physical fields or database columns. This is especially true in a large organization using multiple types of databases, programming languages, warehouses, etc. Regardless, we need a better set of tools and technology to perform these tasks. I know the capability is out there and the technology exists to perform these tasks in the way we need them to be performed. It just needs to be available cost effective and available to the general public.

In Summary .....

The intention is to keep conceptual and logical models independent of technology and implementation choices and considerations. However, even at this point in time, it's difficult to literally keep technology and implementation considerations from infecting them.

This is especially true of data models. Many of these reasons are due to the fact that most widely used and generally available conceptual and logical modeling formalism, notations, and tools are technology dependent or vendor dependent. This results in conceptual and logical data models and metamodels are not 100% technology independent - even those models representing conceptual and logical architectures and meta-architectures.

Most importantly, the IT and information science industries are still missing generally available, technology independent modeling tools or tools suites that are integrated with powerful metadata repositories that can effectively, easily, and cost effectively manage the entire Enterprise Architecture. Even though we are far past the time I originally thought these tools, formalisms, and notations would exist, I still have hope they become easily accessible and accepted as standard before I retire (which is still a long way off).

Note: How These Issues Affect the Articles and Standards on This Website

Please note that the articles and standards on this website will frequently use E/R, ERD, or UML notations. This is due to the currently available, and commonly used, technology constrained modeling notations and tools available on the market discussed at length in this preceding article. However, the use of notations on this website is not an endorsement of any of the notation

Also, please keep in mind this website primarily addresses conceptual and logical data architectures, not physical data architectures. Therefore. any examples of conceptual or logical data models that are depicted on this website using a specific notation will look and be very different from physical data models or database models produced using those same notations - especially those examples depicted using UML notation.