Information Architecture Blog

Article Preface

Many of us who are experienced in creating enterprise architectures and the models that represent them have at least 2 fundamental beliefs.  (1) Whether implicitly or explicitly, in order for a business to operate the entire enterprise architecture exists.  (2) There are important differences between architectures and models.   However, in this article those differences are irrelevant as the majority of the concepts and the primary messages discussed in this article apply to both architectures and models.  Therefore, for brevity’s sake, and to avoid arguments about whether the word “architecture” is being used correctly, the word “model” or models will be used in this article in place of the phrases “architecture and/or model”, “model and/or architecture”, and their plural versions. 

The phrase “semantic model” will be used to refer to a model of the

ontology of an enterprise.  This model is sometimes referred to as a “concept model”, a "fact model”, a “terms and facts model”, or even a "conceptual data model".  However, in this article, the term” semantic model” is used to represent the primary concept expressed by all of these terms.

Finally, in this article the words “entity” and “class” are used to represent the same concept.  This is because some modelers use UML class models to represent logical data models as they find UML class models to be more expressive than traditional ER or ERD models. In these types of UML class models (1) classes do not include operations (or methods), and (2) the rules of logical data modeling are applied, such as data normalization, naming entities and attributes using English language noun phrases (or whatever rules apply to another natural language used by an enterprise), the structure is independent of technical and physical implementation considerations, etc.    Therefore, in this article the words “entity” and “class” are used synonymously.

Where Do Entities, Attributes, and Relationships Originate?

To understand why logical data model entities have definitions yet attributes and relationships have descriptions, it is first important to understand the source of these logical data model components.  That’s relatively easy.  Entities are derived from semantic model concepts, and attributes and relationships are primarily derived from semantic model facts.  However, to really understand the reason entities have definitions but the remaining logical data model components have descriptions, it is important understand the individual semantic model objects from which they originate.

Semantic models and conceptual business rules have a chicken and egg type of relationship.  The business rules are the reasons why all of the concepts are being used by an enterprise.  However, business rules are expressed using words and phrases that represent concepts.  So, to really understand business rules, don’t the concepts first need to be understood?   Regardless of which comes first, business rules and the semantic model do not exist independent of each other.[1]

A semantic model defines and describes the ontology of an enterprise.  The primary components of semantic model are one or more vocabularies of interrelated concepts, their definitions stated in a natural language, terms (words or phrases) used to refer to the concepts, and facts that assert relationships among those concepts.   Concept definitions are based on the parts-of-speech played by the concepts and the contexts in which they are used. There are other objects in a semantic model, but these are the primary objects. 

Most semantic model formalisms explicitly associate facts to objects in conceptual business rule models.  Depending upon the formalism and/or modeling tool being used, semantic model diagrams may contain symbols or ICONS used to depict the cardinality, which express certain business rules or business rule statements.  Other more advanced modeling tools allow facts to be directly related to all business rules that govern them. 

Concepts in semantic models turn into entities (or classes) in a logical data model.  Semantic model facts turn into logical data model attributes or relationships.  Normalized logical data model relationships, and many diagrams on which they are depicted, state some business rules or business rule statements through the cardinality (or multiplicity) stated for each relationship.   Cardinality expresses the minimum and maximum number of times an entity is involved a relationship.

Logical business rules are important to the structure of logical data models as they affect normalization.  Logical data models need to be fully normalized to at least 3rd or 4th normal form (3NF or 4NF), which may result in changes to or additions of to one or more attributes, relationships (including associative entities), and entities.  Therefore, as is the case at the conceptual level, logical data models and logical business rules work together – they do not exist independent of each other. 

Understanding an enterprise’s semantic model and conceptual business rules model in order to accurately build and manage logical data models is required regardless of whether the semantic model and conceptual business rule model are explicitly articulated from the top-down, bottom-up, or middle-out.  Bottom-up or middle-out explicit articulation of models are just two of many ways to back into the explicit articulation of the semantic model and the conceptual business rule model.

Creating models from the bottom-up, middle-out, or any combination of these approaches doesn’t mean an enterprise’s semantic model, business rules, and other conceptual models don’t already exist – the enterprise would be able to operate without them (or the rest of the enterprise architecture).  It simply means one or more people have the semantic model and other conceptual models in their heads instead of in a modeling tool and/or metadata repository.

Which Model Components Have Definitions and Which Have Descriptions?

Does the fact that the enterprise’s semantic model technically existed (along with other conceptual models) prior to the building of the enterprise logical data model mean all the logical data model components have already defined or described?  In some cases the answer is yes, but in most cases the answer is no.  The answer depends upon the logical data model component. 

Each concept has one definition, which is really a sense of definition based on the part-of-speech it plays and the context in which it is being used.  Almost all definitions are written this way even if definitions don’t explicitly state the part-of-speech and context.   Each concept definition should be concise and effective.  An effective definition is one that is semantically correct, syntactically correct, and pragmatic.  Pragmatic in this sense means the definition is understood (by the receivers of the information) exactly as it was intended to be understood (by the sender of the information).

Terms are simply words or phrases used to express concepts – no more, no less.  Concepts have definitions; therefore, terms don’t have definitions – they are simply words or phrase used to identify or refer to concepts.  Synonyms are often identified for concepts, but they are only additional terms used to express concepts.

Each fact is an assertion about the relationship between or among two or more concepts used in the ontology of the enterprise.  Facts are most effectively articulated as statements written as complete sentences (containing a subject and a predicate).  Facts simply make assertions, they do not include rules about those facts.[2]

Because facts relate multiple concepts, they do not have definitions any more than a sentence has a definition.  The concepts (nouns and verbs) used in the sentences to state facts have definitions, but the fact itself does not have a definition.  Some modelers include descriptions about facts, but it is debatable whether such descriptions add additional understanding about statements made by facts.

Entities and classes are derived one to one from concepts, so their definitions already exist.  It becomes redundant to redefine them in a logical data model. It is also a poor metadata management practice to redefine concepts everywhere they are used models, including as the definition of entities.  The definitions of the concepts from which entities or classes are derived should be managed in one primary location.  In other words, just like regular data, there needs to be a “master record” of metadata including the definition of words and phrases, or rather the concepts represented by these words and phrases.

Please understand that the storing metadata and reporting metadata are not the same thing.  When a logical data model is displayed on a screen or printed on a report, entities need to have definitions.  However, to prevent a metadata maintenance nightmare, the definitions should be stored in the concepts from which they are derived.

Make no mistake, it is perfectly acceptable and normal for modelers to download and store concept definitions as entity definitions in one or more modeling tools while creating or modifying components of logical data models.  It’s also normal, and expected, during the logical data modeling process to further refine and update the semantic models.  New concepts come alone with changes in technology, and vocabularies naturally change over time.  However, if the logical data modeling process discovers new concepts, identifies existing concept definitions that need to change, or determines that more modern terms need to be used to express concepts, it is critical for the semantic model to be updated to reflect the new or updated knowledge about the enterprise.[3] 

Translating facts into attribute and relationships is more complex than translating concepts into entities or classes.  The decision of whether a fact turns into an attribute or a relationship is primarily based on the business rules associated with the fact and the rules of data normalization – especially 1st Normal Form (1NF).  However, sometimes the decision of whether a semantic fact turns into a logical data model attribute or a relationship comes down to a judgement call -the important “art” part of modeling.

As is the case with facts, each attribute and relationship represents multiple concepts, not just one concept.  This means, also like facts, attributes and relationships cannot be “defined” any more than a sentence can be “defined”.  Therefore, attributes and relationships do not have definitions; attributes and relationship have descriptions.

Attribute and Relationship Descriptions

Logical data model relationships focus on the verbs asserted by facts.  Each relationship is typically articulated on the diagram so that when read from entity to relationship to entity, it basically reads like a subject-verb-object sentence.   The primary sentence of a relationship description should be written as a complete sentence containing the name of the first entity (a noun concept) the relationship itself (a verb concept) and the second entity (a noun concept).

Typically, a relationship depicted on a logical data model diagram not only represents the fact from which it was derived, but it also represents at least two or more business rule(s), or business rule statements, associated with that fact.  Therefore, each relationship’s description often consists of multiple sentences in order to accurately and more distinctly articulate the multiple pieces of information represented by that relationship.

It is important to point out that most logical data modeling tools use one relationship line on a diagram to represent two different relationships between the same two entities.  One relationship depicted by that relationship line begins with an entity at one end of the line, includes a verb or verb phrase (or verbal noun) written on relationship line itself (or in a circle attached to the line), continues to a second entity at the other end of the relationship line, and depicts cardinality for that relationship.  The second relationship simply goes in the opposite direction along the same relationship line.  Therefore, the description a “relationship” depicted on a logical data model diagram typically includes at least two sentences, each of which articulates one of the two distinct relationships represented by the one relationship “line” on a diagram.

Attributes are characteristics of entities or classes, and they are represented by noun phrases.  In a properly normalized data model there is one instance of one attribute (fact) for one instance of an entity (concept)[4]. Therefore, an attribute’s cardinality of 1,1 with the entity it characterizes is assumed by the fact that it’s an attribute.  However, just like relationships, attributes represent facts - an association among two more concepts.    For example, the attribute “Person Birth Date” is a characteristic of the entity/class “Person” representing the fact “A Person is born on a date”.   Therefore, like relationships, attributes have descriptions.

Do Not Define Each Concept Appearing in a Description for Attributes and Relationships

Attribute and relationship descriptions should not include the definition of each of the terms used in that description.  Defining each concept represented by an attribute in its description has the same result as defining each term used in a sentence in the sentence itself.  Doing so would result in very long, complex, and in most cases, grammatically incorrect.  More importantly, the meaning intended to be conveyed by the sentence would be completely lost by the end of the sentence.  Long and rambling attribute descriptions are also not necessary to understand “what” data or information is represented by that attribute – in fact they often take away from the meaning. 

If one doesn’t understand the meaning of one or more terms used in a sentence, typically one looks them up in a dictionary or they query the internet for definitions.  Not all meaning of a sentence will be contained in that one sentence.  Likewise, if one or more of the terms used in the description of an attribute or relationship is not understood – look them up in the semantic model.  For a variety of reasons, it is neither necessary nor desirable for an attribute or relationship description to also contain the definitions of all words used in that description.

In Summary

Entities are derived from semantic model concepts and therefore they have definitions.  However, from a metadata maintenance perspective definitions should be stored as concepts, but reported as both concept and entity definitions.  This does not, however, meant that concept definitions should not be loaded into logical data modeling tools as entity definitions.  It just means the master record of entity definitions should be concept definitions.  

Attributes and relationships are derived primarily from facts, and therefore they have descriptions.  However, whether a fact becomes an attribute or a relationship is much more complex than deriving entities from concept.  Other factors such as business rules and normalization help make this determination.  Therefore, attribute and relationship descriptions are different from the statement of the facts from which they are derived. 

 

[1] There are many sources of well written information about Business Rules on the internet, including the Business Rules Community website https://www.brcommunity.com and the Business Rules Group website http://www.businessrulesgroup.org.  

[2] Business rules build on facts and state the rules that govern the facts.  For example, the fact is “A person has a legal surname (or last name)”.  The business rule about that fact is “A person must have one legal surname.”

[3] For organizations that have not explicitly articulated the semantic model prior to engaging in logical data modeling, at a minimum the concepts from which the entities are derived should be defined in a glossary.  This glossary can later become part of an explicit semantic model.

[4] The storage and maintenance of historical data, whether it should be included in logical data models and depicted on their diagrams are important and complex topics.  However, they are not being discussed in this article.