Sponsor Link: EAS Training - Get training in the Essential toolset. Register your interest now. Read more
     
Home Getting Started How to Write Integration Transforms
How to Write Integration Transforms PDF Print E-mail

Importing source data from external sources into an ontology such as the Essential Architecture Manager repository means that custom transforms are required. This tutorial describes how the Essential Integration Engine works and provides details about how to develop your own custom transforms.

The Essential Integration Engine is the platform that used by the Essential Integration Tab. However, this Engine is designed to be used by both graphical user interfaces, such as the tab, or by other integration solutions. The Engine provides capabilities to apply XSLT transforms to XML data sources, to create a Python integration script. This script is then split into chunks and executed against the Protege API.

A key feature of the Essential Meta Model is the strong semantics of each meta class. This means that the first step of any integration activity is to understand how the source information maps to the Essential Meta Model. Often, this mapping is from the specific instances of external information, rather than to generic representation formats (e.g. XMI, UML, XML). That is, we need to understand the semantics of the content of the external source, not just the semantics of the representation format.

This means that custom mappings (and therefore transformations) are a necessity and this tutorial describes how to write such transforms.

We will be maintaining a library of integration transforms in the Share area of the Essential Project website, as and when 'standard' imports are identified - that is imports from specific technical formats (e.g. a specific XML schema) using consistent, agreed and repeatable semantics for each instance in the source data.

Each transform that you create can be re-used to synchronise, update and import new instances from the external source, e.g. from a configuration management database.

How the Essential Integration Engine Works

The Essential Integration Engine imports information and data from external sources by operating on the Protege Java API to ensure that the consistency and integrity of the ontology is preserved. The Standard Functions library provides a suite of functions that you call from your custom transform and these functions handle most of the required API calls.


Essential Integration Engine Schematic

This diagram describes how the integration server operates. Transforms are written in XSL that produces the integration script for the specific source data instances. It is this script that uses the Protege API - via the Standard Functions - to import and synchronise the external data with your Essential ontology.

How to construct a transform

The key concept in the transforms is that they take the form of XSL documents that transform your source information into a Python script that imports and synchronises the source data. The Integration Engine executes the XSL against the source data and then executes the resulting script to complete the integration process.


See 'importEssentialInstances.xsl' for examples of the transform XSL.

To support the synchronisation, the integration engine needs to understand which external data source or repository the source information has come from. This is done by giving your data source a name that will be used each time you import from that source. e.g. this could be the name of another repository or it could even be the name of a specific spreadsheet that contains some source data you wish to import.
The step of your transform is to ensure that this repository has been defined in the Essential repository.

Standard Start of the Transform Document

This repository definition should be part of a standard start to every transform XSL document. You can use the example transforms as templates on which to base your transforms. This should contain:

  • Specification of XSL version 2
  • A statement to import the java library to the script engine
  • A statement to execute the standard functions library script (this loads all the standard functions that your transform will need)
  • A statement calling the function to define your external repository.

XSL document header

As the transform is creating a Python script file, the XSL is mostly made up of <xsl:text> statements. Note that each statement in the script must be finished by a carriage return character, using the &#xa; statement.

Chunking Tokens

A restriction in the underlying Apache scripting engine limits the size of a script file that can be run in a single call to execute the script. The Essential Integration Engine splits long scripts into smaller chunks, which preserving the state between each chunk. This means that you can reference variables across chunks of the script, and in turn this means that this chunking of the script has no impact on the script that your transform creates.
However, to ensure that the script is split at valid points - i.e. at the end of a script statement, rather than in the middle of one - you should write 'chunking tokens' after each node of the source data has been processes. e.g. after each application definition in a list of applications to import, or after a business process definition from a list of processes to import.

The chunking token to use is: ####_End_of_Node_####
Coding a Chunking Token

The Process of Writing a Transform

Map the source information to the Essential Meta Model

The first step of the process of writing your own transform is to understand how your source information should map to the Essential Meta Model. Identify the Essential Meta Classes that your are importing instances of and the slots on those classes that you will be populating.

Plan your script

Having identified logically how your source information maps to the Essential Meta Model, you should understand what the integration needs to do.

  • Are you creating or updating instances in the Essential repository?
  • How do you split a source instance into the components in the Essential Meta Model, if a direct mapping is not possible?


If you are also importing relationships between the imported instances from the source data, it is very important that the semantics of the relationships match.

The Protege Script tab is very helpful when developing the transform scripts as it provides an interactive scripting environment where you can refine the script that your transform needs to build.
When using the script tab, make sure to start by entering the statements to import the java libraries and to load the standard functions, e.g.:

  • import java
  • execfile("Documents/standardFunctions.txt")

 

Write the XSL Document

Once you are clear about the script that you need to run to import your source information, you are ready to produce the XSL document that transforms your source data.

External Repository Instance References

To support the on-going synchronisation of external data and to avoid import duplications, each instance or relationship that is imported is assigned a unique identifier, called the External Repository Reference Instance. You should identify a unique identifier in your source data for each instance that you wish to import. This could be the repository identifier of the source instance as it appears in the source repository or it could be the name of the object to import, as long as it is unique. Each instance that is imported has this external identifier combined with the external repository identifier to create a unique reference for each instance in Essential that has been imported.

The standard functions take care of most of the technicalities of how these external references are managed.

Scripting Environment Basics

The scripting environment provides a key global variable, 'kb', that provides a reference to the overall Essential knowledge base. From this variable, you can access all the instances, classes and slots that you need in order to perform your imports.

Another important variable is 'includepath'. This is passed to the scripting environment by the Integration Engine, sending it the full path that is specified on the Integration Tab. This path is used to tell the Integration Engine where to look for referenced script files such as 'standardFunctions.txt'

Examples

- Get a reference to a specific instance, e.g. of a Business Process
anInstance = getEssentialInstance("Business_Process", "proc1234", "MySourceRepos", "Pay Invoice")

- Get a reference to a Slot:
aSlot = kb.getSlot("my_slot_name")

- Set a slot value for an instance - you can use the setOwnSlotValue() function from the Protege API:
anInstance.setOwnSlotValue(kb.getSlot("my_slot_name")), "New Slot Value")

The standard functions provide a helper function for setting slot values on an instance, setSlot().
Take note of the type that the target slot expects. For String and Instance slots this is fairly straight-forward - remember to use "" around a string. If it's expecting a numeric (e.g. Float, Integer) remember to just use the value in the call.
So, we should re-write the above statement as:

setSlot(anInstance, "my_slot_name", "New Slot Value")

To avoid issues with mis-matches of cardinality on instance slots, we recommend using the addIfNotThere() standard function to set Instance Slots: addIfNotThere(anInstance, "my_slot_name", aReferencedInstance). See below

For slots that can contain multiple values, use addOwnSlotValue(). e.g to add an instance to a relationship slot - in this case, add an actor to a parent group:
aChildInstance = getEssentialInstanceContains("Group_Actor", "org_unit_11", "OrgModel", "IT Services")
anInstance.getEssentialInstanceContains("Group_Actor", "org_unit_08", "OrgModel", "My Company")
anInstance.addOwnSlotValue(kb.getSlot("contained_sub_actors")), aChildInstance)


However, in the last example there's a useful helper function that ensures that you're not adding a duplicate relationship, addIfNotThere(). So, this is how the call should look:

aChildInstance = getEssentialInstanceContains("Group_Actor", "org_unit_11", "OrgModel", "IT Services")
anInstance.getEssentialInstanceContains("Group_Actor", "org_unit_08", "OrgModel", "My Company")
addIfNotThere(anInstance, "contained_sub_actors", aChildInstance)

This function should also be used for all single-cardinality Instance slots as it automatically uses the correct Protege API call depending on whether the slot can accept multiple instances or not.

 

Standard Functions Library

For most custom transforms, the getEssentialInstance() function is all that is required. This uses the source identifier for an instance and attempts to find it in the Essential ontology repository. If found, the instance is returned and this can then be updated in terms of its slot values. However, if it is not found, a new instance is created in the Essential Architecture Manager repository using the specified source identifier to create a unique external reference in the Essential repository. This new instance is then returned and can then be updated in terms of its slot values.

There are some variations of the getEssentialInstance() that provide some fuzzier matching, e.g. based on the name of the instance rather than a source identifier.

The following definitions describe the standard functions and how they should be used.

getEssentialInstance(theClassName, theExternalRef, theExternalRepository, theInstanceName)

# Get a reference to the instance of the specified class that has the specified external reference in the
# specified external repository. If such an instance cannot be found, create one with the specified
# name (real name, not instance name)
# theClassName - the name of the Essential meta class
# theExternalRef - the unique reference/instance ID of the element in the external repository
# theExternalRepository - the name of the external repository
# theInstanceName - the name of the instance that is being created/updated in the integration

getEssentialInstanceContains(theClassName, theExternalRef, theExternalRepository, theInstanceName)

# Find the instance by a contains case-sensitive match on the instance name in Essential repository
# Use this for getting instances that are expected to already be in the repository
# If not found, create a new one.
# theClassName - the Essential class for the instance
# theExternalRef - the External Reference ID for this instance
# theExternalRepository - the External Repository that theExternalRef applies to
# theInstanceName - the name of the instance in the Essential Repository

getEssentialInstanceContainsIgnoreCase(theClassName, theExternalRef, theExternalRepository, theInstanceName, theMatchString)

# Find the instance by a contains match - ignoring case - on the instance name in Essential repository
# Use this for getting instances that are expected to already be in the repository. Use theMatchString
# to specify the string to use as a match on existing instances.
# If not found, create a new one.
# theClassName - the Essential class for the instance
# theExternalRef - the External Reference ID for this instance
# theExternalRepository - the External Repository that theExternalRef applies to
# theInstanceName - the name of the instance in the Essential Repository
# theMatchString - the string that should be used to match against and find the instance

getEssentialInstanceIgnoreCase(theClassName, theExternalRef, theExternalRepository, theInstanceName, theMatchString)

# Function to find instances by a name match (precise, not contains), regardless of case
# Use this for getting instances that are expected to already be in the repository. Use theMatchString
# to specify the string to use as a match on existing instances.
# If not found, create a new one.
# theClassName - the Essential class for the instance
# theExternalRef - the External Reference ID for this instance
# theExternalRepository - the External Repository that theExternalRef applies to
# theInstanceName - the name of the instance in the Essential Repository
# theMatchString - the string that should be used to match against and find the instance

getEssentialNodeInstanceIgnoreCase(theClassName, theExternalRef, theExternalRepository, theInstanceName, theMatchString)

# Function to find Technology_Node instances by a name match (precise, not contains), regardless of case
# Use this for getting instances that are expected to already be in the repository. Use theMatchString
# to specify the string to use as a match on existing instances. Matching is based on just the hostname
# and strips any trailing domain components from the name, after the first '.'
# If not found, create a new one.
# theClassName - the Essential class for the instance
# theExternalRef - the External Reference ID for this instance
# theExternalRepository - the External Repository that theExternalRef applies to
# theInstanceName - the name of the instance in the Essential Repository
# theMatchString - the string that should be used to match against and find the instance

getExternalRefInst(theExternalRefList, theExternalRepository)

# Return the external reference that applied to the specified External Repository from a list
# of external references from an Essential instance.
# theExternalRefList - the list of external reference records for an EssentialInstance
# theExternalRepository - the instance of the external repository

createExternalRefInst(theExternalRepositoryName, theExternalReference)

# Create a new External Reference record to be associated with an Essential instance.
# theExternalRepository - the name (a String) of the external repository
# theExternalReference - the reference ID that is used in the specified external repository

getExternalRepository(theExternalRepositoryName)

# Get a reference to the instance of External_Repository that has the specified name
# theExternalRepositoryName - the name of the external repository

timestamp()

# Return a string of the current date/time to be used for timestamping.

setOrUpdateTechInstAttributeByName(theAttributeName, theAttributeValue, theInstance)

# Update the named attribute associated with the specified technology instance object
# or create it if it's not already been defined

setOrUpdateTechNodeAttributeByName(theAttributeName, theAttributeValue, theInstance)

# Update the named attribute associated with the specified technology Node (theInstance) object
# or create it if it's not already been defined

setSlot(theInstance, theSlotName, theInstanceToAdd)

# Set a slot value to the specified value. To be used with single-cardinality slots
# only
# theInstance - the instance to which we wish to set the slot to contain theInstanceToAdd
# theSlotName - the name of the slot on theInstance
# theInstanceToAdd - the instance to set in theSlotName slot on theInstance

addIfNotThere(theInstance, theSlotName, theInstanceToAdd)

# Add the slot value to the specified instance only if it's not already there.
# theInstance - the instance to which we wish to add theInstanceToAdd
# theSlotName - the name of the slot on theInstance
# theInstanceToAdd - the instance to add to theSlotName slot on theInstance
# v1.2.1: If theSlotName is a single cardinality slot, use setSlot()

defineExternalRepository(theExternalRepository, theDescription)

# Define a new External Repository or ignore definition if the repository is already known
# theExternalRepository - the name of the external repository

addNewEAMAttribute(theName, theDescription, theUnit)

# Function to add a new Attribute instance to the Essential model.
# theName - the name of the Attribute
# theDescription - a description of the attribute
# theUnit - the units of the attribute value, e.g. MB, Mbps, kg or '_' or space if not applicable

getNameSlot(theInstance)

# Find the right name slot for a given instance. If it's an EA_Class instance, the
# name slot will be returned. If it's an EA_Relation instance, the relation_name slot
# is returned. If it's an instance of :EA_Graph_Relation, the :relation_name slot is
# returned
# theInstance the instance for which the correct name slot is required.
# returns a string name of the correct slot to use.

getNameSlotForClass(theClassName)

# Find the right name slot for a given class. If it's an EA_Class, the
# name slot will be returned. If it's an EA_Relation class, the relation_name slot
# is returned. If it's an class of :EA_Graph_Relation, the :relation_name slot is
# returned
# theClassName the name of the class for which the correct name slot is required.
# returns a string name of the correct slot to use.

 

 
Related Articles