SEMANTIC QUERY ON MATERIALS DATA BASED ON MAPPING MATML TO AN OWL ONTOLOGY

MatML plays an important role in materials data applications while structure-aware query techniques (e.g., XPath and XQuery) are used to search the content of MatML. However, both XPath and XQuery cannot efficiently retrieve sets of MatML on a conceptual level. In this paper, we propose an approach to transform MatML-based materials data into an OWL ontology. As such, materials data can then be explored in a more semantic way. The proposed method formally defines a set of rules to extract the corresponding OWL ontology (named MatOWL) from a given MatML schema. The instance transformation from MatML to MatOWL is implemented with the help of an intermediate object model. The algorithm for instance transformation is also given. Further, MatOWL can be mapped to other ontologies with logic rules to provide more semantic context for domain experts, and more materials knowledge can be obtained by reasoning on the OWL ontology. An experimental prototype demonstrates the effectiveness of our proposed approach.


INTRODUCTION
In recent years, industrial communities and research institutions have accumulated immeasurable amounts of materials data (Westbrook, 2003), which are sure to facilitate materials science research.However, the complexity of materials data makes it hard for users to integrate and share these scientific data (Iwata, Shichijo, & Ashino, 2001).Materials data uses, such as access, acquisition, and interoperability, have gradually become the most challenging problems facing materials informatics (Hunt, 2006).As stated at the materials informatics workshop held at University of Queensland in 2006 (U Queensland, 2006), informatics is still in its infancy in the materials domain, and researchers of materials informatics should keep up with other scientific communities such as biology, astronomy, and social sciences.In order to exchange, integrate, and share materials data, a unified materials data model is needed.MatML (MatML Schema, 2004) and NMC-MatDB (NMC-MatDB Schema, 2007) are materials data models built for this purpose.A CODATA Task Group for Exchangeable Materials Data Representation (Task Group, 2006) was established at 2006 to focus on topics related to interoperability of heterogeneous data resources.Swindells (2002) described the representation of engineering properties, which includes material properties, by the use of ISO 10303 standards (ISO, 1994).ISO 10303-45 Edition 2 (ISO, 2008) defines an information model to enable the representation of materials and other engineering properties of a product.This standard is extended by the development of ISO 10303-235, which focuses on the representation of the measurement process of a property and will shortly be published.The Data Science Journal, Volume 8, 10 January 2009 terminologies of properties and measurement processes are specified in ISO 13584 (ISO, 2001) as a series of dictionaries.
MatML is an extensible markup language developed especially to facilitate the exchange of materials information.It can uniformly represent materials property data to resolve syntactic and structural heterogeneity.
As MatML is simple, flexible, and understandable, it offers many benefits (Sturrock, Begley, & Kaufman, 2001) to materials scientists and engineers.Research and applications about MatML have emerged in recent years.
MatML is used in NSDL MatDL (Bartolo, Lowe, Sadoway, Powell, & Glotzer, 2005), and a web GUI has been developed to compare the properties of more than 80 types of materials.Begley and Howard-Reed (2005) have discussed the application of MatML to containment emission data, and a mapping approach from RDBMS to MatML is presented to transform these emission data into an exchangeable format.Bartolo & Lowe (2003) mapped MatML tags to Dublin Core elements to reuse the detailed information provided by MatML, so as to equip Dublin Core metadata with the capability of describing materials domain features.Varde, Begley, and Fahrenholz-Mann (2006) have presented some success stories for MatML in data mining applications, such as failure analysis and decision support systems.
As MatML uses a tree-based structure to represent its content, queries (XPath or XQuery) on MatML are tightly coupled with the tree structure of the XML documents.However, materials scientists are more likely to write a semantic query by using domain terms that they are familiar with, instead of exploring the complex document structure.Therefore, MatML can not meet this requirement elegantly.MatML currently has not defined the names of materials properties and experiment data items (Ashino & Fujita, 2006).As such, it lacks the capability to describe the high-level abstraction of concept semantics, and machine-processable ontologies are required to provide the semantic mappings between related terms (Cheung, Drennan, & Hunter, 2008).Furthermore, from the perspective of data integration, the semantics of XML data should be specified explicitly to resolve semantic heterogeneity.Ashino and Oka (2007) have shown that MatML is not adequate for data exchanges between heterogeneous materials databases and have proposed a framework by using an ontology to define the structure of domain concepts.Hunter, Little, and Schroeter (2008) have developed an ontology for integration of disparate materials databases.These excellent research achievements pay more attention to the semantic integration of databases.Differing from the existing work, we believe that both integration and utilization of MatML data are becoming more and more important.However, this topic has not yet been well studied.
In this paper, we propose an approach to use the achievements of the semantic web (Berners-Lee, Hendler, & Lassila, 2001) by transforming MatML to an OWL (Bechhofer, Harmelen, Hendler, Horrocks, McGuinness, Patel-Schneider, et al., 2004) ontology, so that materials scientists can perform semantic queries on materials data derived from the MatML documents.The approach formally defines a set of rules to extract the ontology (called MatOWL) from a given MatML schema and utilizes an intermediate object model to help instance transformation from MatML to MatOWL.The proposed MatOWL can be easily extended by mapping it to other ontologies with logic rules.As such, more semantic context for domain experts can be provided for semantic-related retrieval tasks.
The remainder of this paper is organized as follows.The related work is discussed in Section 2. Section 3 gives the problem description.Section 4 introduces the rules used for extracting MatOWL.Section 5 describes the process and algorithm for the instance transformation.Section 6 discusses how to enhance MatOWL and Data Science Journal, Volume 8, 10 January 2009 compose SPARQL query on the translated materials data.Section 7 shows experimental results and demonstrates the prototype.Section 8 concludes the paper.

RELATED WORK
In the recent years, much research has concentrated on how to build mapping from XML to an ontology efficiently.From the view point of real-life applications, some researchers focus on metadata generation for web pages, while others focus on data integration.From the view point of implementation techniques, some approaches aim at extracting an ontology from XML documents or XML schema (Ferdinand, Zirpins, & Trastour, 2004;Bohring & Auer, 2005), while others are dedicated to mapping the XML documents to an existing ontology (Reif, Jazayeri, & Gall, 2004;Rodrigues, Rosa, & Cardoso, 2005;Kobeissy, Genet, & Zeghlache, 2007).
WEESA (Reif, et al., 2004), JXML2OWL (Rodrigues, et al., 2005) and XMLTOWL (Kobeissy, et al., 2007) generate mappings between an XML schema and a specific OWL ontology.WEESA and XMLTOWL use XML to express mappings, whereas JXML2OWL adopts XSLT.Once the mappings are defined, these approaches can transform XML documents to the corresponding OWL instances automatically.The advantage of this kind of solution is that an existing ontology can be utilized to provide a semantic interpretation for the XML data.
The XML2OWL (Bohring & Auer, 2005) framework also supports instance transformation, and it can extract an OWL ontology from a XML schema or even from XML documents.The extraction and transformation rules are implemented by XSLT.On the other hand, Ferdinand, et al. (2004) proposed a method to convert XML to RDF as well as to transform a XML schema to OWL, though the two transforming processes are independent of each other.
From the perspective of data integration, Lehti & Fankhauser (2004) use an OWL ontology as the global view in the data integration system and then construct mappings between the XML schema of a data source and the OWL ontology.It does not dump all instances from the data source, whereas the original query on the OWL ontology has to be translated into the corresponding XQuery.Different from the simple value correspondence between XML and ontology, An, Borgida, & Mylopoulos (2005) propose an approach to help end-users construct complex mapping formulas between XML schemas and OWL ontologies.The mappings are expressed as a subset of First Order Logic.
Compared with the approaches mentioned above, our method has the following features.First, we do not map a XML document to a particular ontology.Alternatively, we focus on how to extract the OWL ontology from the XML document and then how to associate it with other OWL ontologies.Second, we are concerned with a particular type of XML document (i.e., MatML) in the materials domain.Hence, we must use some specific rules to extract information from MatML.Third, we use logic rules to express the relationships between the extracted ontology and other domain ontologies, and therefore the extracted ontology can be further used in a user-preferred manner.

PROBLEM DESCRIPTION
As more and more materials resources emerge in the MatML format, it is significant and necessary to provide a Data Science Journal, Volume 8, 10 January 2009 convenient interface for users to access these scientific data.The motivation of our work is to bridge the semantic gap between materials experts and structure-aware queries on MatML, so that users can exploit the content of MatML in a more semantic way.Structure-aware query here means that the query should more or less be aware of the structural information of the source data.Thanks to the rapid development of semantic web technology, we therefore can use technologies such as ontology and logic rules to help to solve this problem.
Before we give the overview of our approach, some related definitions are given below.
The MatML version 3.1 schema contains more than 50 complex and simple types.Elements and attributes of MatML are declared as these types.Accordingly, the MatML schema can be formally defined as follows.

Definition 1.
The Applications in the materials science domain need more semantic support, such as that offered by an ontology.
An ontology is a formal, explicit specification of a shared conceptualization (Gruber, 1993;Studer, Benjamins, & Fensel, 1998).An ontology can be used to capture some shared domain knowledge (e.g., the key concepts of a domain and important relationships between them), and it can also serve as the basis for logic reasoning on the information content in a specific domain.Based on the language description of an OWL (Bechhofer, et al., 2004), the ontology extracted from MatML can be defined as follows.Based on these definitions, we propose an approach for mapping MatML to MatOWL so that users can perform semantic queries on materials data.The overview of our approach is illustrated in Figure 1.After four steps, users can semantically access MatML-based materials data by SPARQL queries.
Step 1: Extraction of MatOWL.This step analyzes the structure of the MatML schema and extracts the concepts Data Science Journal, Volume 8, 10 January 2009 and properties of a MatOWL ontology from the MatML schema by a set of heuristic rules.Through this step, the TBox of MatOWL is constructed.
Step 2: Instance transformation.Instance data are transformed from MatML documents to MatOWL.
Consequently, MatOWL is populated with materials property data, and the ABox of MatOWL is constructed.
Step 3: Enhancement of MatOWL.This step adds more semantics to MatOWL.MatOWL is associated with concepts from other existing domain ontology by logic rules.
Step 4: Query building.A SPARQL query is built based on the domain terms that come from MatOWL or the associated domain ontology.
In the following sections, each of the four steps is described in more detail.

EXTRACTING MATOWL
Because MatML is a data model for materials property data, the MatML schema can provide the basic vocabulary and structure for the description of this domain.Therefore, extracting an ontology from the MatML schema is not only a good idea but also a feasible solution.Hunter, et al. (2008) have designed an ontology based on MatML for database integration.Inspired by this idea, we deeply analyze the complex types and simple types defined in the MatML schema and propose a set of heuristic rules for extracting both concepts and properties from the MatML schema to build MatOWL.The extracted ontology involves the concepts and properties about materials property data.The main rules for extracting MatOWL from the MatML schema are given as follows.
Rule 1. Rule for class generation.
( ) This rule implies that each t, which is a complex type or simple type with enumeration restriction, is transformed to a corresponding class in MatOWL.For example, complex type PropertyData and simple type (3) Formula (2) implies that for each complex type ct, if the type of its element (or attribute) x is a complex type or simple type with enumeration restriction, x is extracted as an object property op.Meanwhile, class f(ct.name) is added to the domain of op, and class f(x.type.name) is added to the range of op.For example, the element Name in the complex type PropertyDetails is extracted as an object property hasName.On the other hand, formula (3) implies that for each complex type ct, if the type of its attribute x is xsd:IDREF, x is extracted as an object property.As the attribute Property in the complex type Propertydata has type of xsd:IDREF, it is transformed to an object property.
Rule 3. Rule for data type property generation.

MatOO model
MatOO is a model that is used to facilitate the process of instance transformation.Actually, MatOO is an object-tree with the same hierarchy as the element-tree of MatML.The elements are defined as Java classes, and the children of an individual element are defined as its member variables.If a child element can occur multiple times, then the corresponding member variable is defined as an object array; if the type of the child is xsd:IDREF, it is defined as the correcponding referenced type.Figure 4 shows a part of the MatOO model.For example, PropertyData's child specimen is the type of xsd:IDREF, so it is defined as the class SpecimenDetails; because PropertyData's child ParameterValues may occur multiple times, it is defined as an array.The names of the Java classes and member variables come from MatML schema, and both getter and setter methods are also defined in these classes.We also define a mapping table to describe the name correspondence between MatOO and MatOWL.Currently, we just use a XML document to describe the name mappings by which the name of class and property from MatOO can be mapped to the identifier of MatOWL class and property.

Algorithm for instance transformation
Due to the similar structure of MatOO and MatML, it is easy to parse a MatML document and populate the instance data into the objects of MatOO.How to dump data from MatOO to MatOWL is the key step of our approach.The algorithm for the instance transformation from MatOO to MatOWL is given in Table 1 as   The algorithm first gets all fields (members) of the root class C, with F C being a set of fields of class C (line 1).
For each field , the algorithm then tries to transform its value to the MatOWL instance (line 2).In the next step, the property name pstr in MatOWL corresponding to c f i is retrieved from the mapping table between MatOO and MatOWL (line 3).If the property name is not null, the algorithm gets the ontology property object p from the ontology model OM (line 4-5).Because c f i may be an array, for each object  provide more semantics than the MatML schema.Second, MatOWL is not convenient enough for users to write a query.On the contrary, users are more willing to write a query by using domain terms that they are familiar with, i.e., one user-oriented query view is more convenient for materials scientists.If we would like to get more interesting query results from MatOWL with an OWL inference engine, additional concepts, axioms, and rules should be added to MatOWL.Therefore, it is necessary to extend MatOWL.
In our opinion, there are two ways to make MatOWL more useful in the semantic world: (1) to add domain concepts, property, and axioms to MatOWL directly; (2) to map MatOWL to other existing ontologies in materials science.Then the DL axiom or logic rules could be built to associate MatOWL with the new contents in both ways.
After enhancing MatOWL by new domain concepts with the logic rules, writing semantic queries on the materials data becomes fairly intuitive to materials scientists.When users want to retrieve materials data from MatOWL, they can utilize not only concepts and properties from MatOWL but also domain concepts and properties from other domain ontologies.In this way, we can implement a user-oriented and semantic-based query on the MatOWL instances which are derived from MatML.
For example, a user wants to perform a semantic query for materials selection using domain terms such as high-strength materials or corrosion-resistant materials.We should build a (or use an existing) materials selection ontology that defines the related domain concepts and properties.Then we can map the materials selection ontology to MatOWL by adding the logic rules between them.In other words, we use knowledge from MatOWL to explain the concepts of materials selection ontology (shown in Figure 6).Once the mapping rules are created, we can compose a query using terms from the materials selection ontology, which is just a virtual Data Science Journal, Volume 8, 10 January 2009 view for users.The rule engine can help to answer query automatically using the instances from MatOWL.

EXPERIMENTAL PROTOTYPE
An experimental prototype has been implemented to evaluate our proposed method.In particular, the prototype has the following four functionalities: (1) it can be used to browse materials property data in MatML documents; (2) one or more MatML documents can be transformed into corresponding MatOWL instances; (3) we can use this prototype to execute a SPARQL query to retrieve the desired resources from the translated MatOWL; and (4) the prototype utilizes the OWL reasoner and rule engine to return the inferred query results.2.

Figure 1 .
Figure 1.Overview of the approach complex type Metadata in the MatML schema has eight sub-Elements (i.e., AuthorityDetails, DataSourceDetails, MeasurementTechniqueDetails, ParameterDetails, PropertyDetails, SourceDetails, Data Science Journal, Volume 8, 10 January 2009 SpecimenDetails, TestConditionDetails), each of which has its own declared types.The corresponding classes to these types are defined as subclasses of f('Metadata').An example is given in Figure 2, which illustrates how the above extraction rules are used to accomplish the MatML schema transformation.The left part of Figure 2 is a snippet of the MatML schema, and the corresponding MatOWL is on the right.The complex type PropertyData is extracted as the class PropertyData.The attribute property, whose type is xsd:IDREF, is transformed to the object property isPropertyOf, and the referenced type PropertyDetails is translated into the class Property, which becomes the range of the property isPropertyOf.The element Name is converted into the object property hasName whose domain contains the class Property, and the range is the class Name.The element Notes is mapped to the data type property notes whose range is xsd:string and the domain contains the classes PropertyData, Specimen, DataSource, etc.

Figure 3 .
Figure 3. Process of instance transformation

Figure 4 .
Figure 4. MatOO Model (partial) pseudo code.The input of the algorithm is an ontology (i.e., MatOWL) model OM, a root class C of MatOO, and an initial instance I.The output of the algorithm is an ontology model OM' with populated OWL instances.Data Science Journal, Volume 8, 10 January 2009

Figure 5 Figure 5 .
Figure 5 shows an example of the instance transformation from MatML to MatOWL.The left part is a snippet code of an input MatML document, while the right is the partial MatOWL instances generated by the algorithm.

Figure 6 .
Figure 6.Mapping between ontologies using logic rules
This set of rules is used to generate cardinality restrictions for the generated property.Exist(x.att) is used to test c p MaxCardinality , .c p MinCardinality ) represents a Cardinality (MaxCardinality, MinCardinality) restriction on property p for class c.In the MatML schema, if there is no miniOccurs (maxOccurs) declared in an element or attribute, it implies miniOccurs=1 (maxOccurs=1).Therefore, if neither miniOccurs nor maxOccurs exists, it indicates "exactly one."The expression maxOccurs='unbounded' means there is no upper bound for occurring times, so the corresponding property has no MaxCardinality restriction.

Table 1 .
An algorithm for instance transformation from MatOO to MatOWL Algorithm.transform (OntModel OM, Object C, Individual I) of the instance I (line 7-8); otherwise, if p is an object property, we first get the range of p (named RI p ), and link I and RI p by property p; then the algorithm makes a recursive call using as the initial instance respectively (line 9-12).Finally, if pstr is null, it means that no corresponding property name is found in the mapping table.The algorithm recursively executes lookup from the next level in the object-tree in order to check if the corresponding property name can be found (line 13).When this recursive algorithm stops, it has traversed all MatOO objects whose instance data have been populated into the ontology model OM, which now becomes the result ontology model OM'.