http://www.datagovernance.com/
Unit 4: Data Management
Unit 5: Data Management
Wang,X., ENGO351
Data management: Outline
Definition of a database
Database management systems
Conceptual data modelling
Logical data modelling
The entity-relationship model
The relational database model
The standard query language (SQL)
Unit 5: Data Management
Wang,X., ENGO351
Database A database is a logical, structured collection of data about things, their attributes and their relationships to each other
A database is a collection of related data.
Within a GIS, these things and relationships have a spatial component
Unit 5: Data Management
Wang,X., ENGO351
Spatial Data Examples Examples of non-spatial data Names, UCID, grades of a student Examples of Spatial data Census Data NASA satellites imagery - terabytes of data per day Weather and Climate Data Rivers, Farms Medical Imaging Exercise: Identify spatial and non-spatial data items in A phone book A cookbook with recipes
Unit 5: Data Management
Wang,X., ENGO351
Why use database? Data Redundancy and Inconsistency - The same piece of information may be duplicated in several different files. Difficulty in Accessing Data: Conventional file processing environments do not allow data to be retrieved in a convenient and efficient manner. Data Isolation: Data is scattered in a number of different files. Linking these data to provide new information may require a substantial amount of programming to answer each specific query. Concurrent Access Problems: Multiple s attempting to access the same file may lead to problems. Security Problems: Certain s should not be allowed to access certain data-sets and not be permitted to edit certain data that they do not own. Integrity Problems: Some data values should satisfy certain consistency constraints. Unit 5: Data Management
Wang,X., ENGO351
Database management system A database management system (DBMS) is a collection of software programs that facilitates the processes of defining, construction, and manipulating the database for various applications
A DBMS provides tools for data input, search, retrieval, manipulation, and output
Most commercial GIS include database management tools for local databases
ArcGIS uses Microsoft Access
Unit 5: Data Management
Wang,X., ENGO351
Data model A data model is a description or view of the real world and data modeling is the process that formalizes the description or view at different levels of data abstraction
The real-world is made up of complex spatial objects and phenomena
A data model tends to be tailored to a specific application or problem context
Different s may have different data models
Unit 5: Data Management
Wang,X., ENGO351
Data models and levels of abstraction Data modeling involves three steps corresponding to increasing levels of abstraction:
conceptual modeling
logical modeling
physical modeling
Unit 5: Data Management
Wang,X., ENGO351
Conceptual model (1)
A conceptual model represents the ’s perception of the real world Data abstraction is strictly limited to the description of the information content of the ’s view of the real world, without any concern for computer implementation (Brown, 1997) Unit 5: Data Management
Wang,X., ENGO351
Conceptual model (2)
The function of a conceptual data model is to provide the necessary language to describe how we naturally conceptualize data organization Although conceptual data modeling is part of the database design process, conceptual models are database-independent
A conceptual model provides a basic reference for s who need to understand the structure of the data in the system
Unit 5: Data Management
Wang,X., ENGO351
Conceptual model (3)
A conceptual data model provides a way to communicate between s, designers and computers
(Worboys, 2004) Unit 5: Data Management
Wang,X., ENGO351
The entity-relationship model
The entity-relationship (E-R) data model is one of the most commonly-used conceptual data models for GIS data modelling
It is based on the concepts of: entities attributes
relationships
to represent real-world features, their properties and their relationships
Unit 5: Data Management
Wang,X., ENGO351
The entity-relationship diagram
An entity-relationship diagram (E-R diagram) can be used to express the features and properties of an E-R model
(Lo and Yeung, 2007)
Unit 5: Data Management
Wang,X., ENGO351
Entity An entity is a ‘thing’ or ‘object’ in the real world that is distinguishable from all other objects
each person in a classroom is an entity
An entity has an independent existence
physical existence: a person, a car, a forest
conceptual existence: a company, a job, an agenda
Entities
can be thought of as nouns.
Examples:
Entities
a computer, an employee, a song, a mathematical theorem
are represented as rectangles.
Unit 5: Data Management
Wang,X., ENGO351
Relationships between entities
A relationship captures how two or more entities are related to one another. i.e. the relationship ‘performs’ between the entity artist and song
Relationships can be thought of as verbs, linking two or more nouns Relationships are represented as diamonds, connected by lines to each of the entities in the relationship.
Unit 5: Data Management
Wang,X., ENGO351
Attributes
Entity and relations can have attributes. i.e. an employee entity might have an SIN attribute i.e. entity City has attribute types name, population, density
The attributes are associated to each occurrence of the entity type
Attributes are represented as ellipses connected to their owning entity sets by a line.
Unit 5: Data Management
Wang,X., ENGO351
Relationship types (1)
Relationship types are subdivided into three categories: one-to-one (1:1) many-to-one (N:1) many-to-many (M:N)
Participation constraint: entity occurrence may only exist if it participates in a relationship. One can have total and partial participation constraints
A
B Double line m e ans B c a nnot exist without A. A c an exist without B.
i.e. one director is the manager of at most one cinema (double lines means mandatory)
(Worboys, 2004) Unit 5: Data Management
Wang,X., ENGO351
Relationship types (2)
In a many-to-one relationship, many occurrences of one entity type may have a relationship with at most one occurrence of another entity type i.e. many cinemas are located in a town
(Worboys, 2004)
Unit 5: Data Management
Wang,X., ENGO351
Relationship types (3)
In a many-to-many relationship, many occurrences of one entity type may have a relationship with many occurrences of another entity type i.e. many roads connect many cities
(Worboys, 2004, p. 58) Unit 5: Data Management
Wang,X., ENGO351
Process of creating E-R model 1. Start with textual description 2. Identify entities (nouns) 3. Tabulate entities 4. Determine relationships 5. Determine cardinality ratio/participation constraints 6. Determine attributes & key attributes
Unit 5: Data Management
Wang,X., ENGO351
An Example: World Database Conceptual Model 3 Entities: Country, City, River 2 Relationships: capital-of, originates-in Attributes listed in the figure
Unit 5: Data Management
Wang,X., ENGO351
Logical model (1)
A logical data model represents the real world by means of diagrams, lists, and tables designed to reflect the recording of the data in of some formal language
Logical models are software dependent; they must be expressed in of the language of a specific database management system (i.e. relational model)
Unit 5: Data Management
Wang,X., ENGO351
Logical model (2)
This figure illustrates the logical model that was created from the conceptual model previously shown
(Brown, 1997) Unit 5: Data Management
Wang,X., ENGO351
The relational database model (1) The relational database model, developed by Codd in the early 1970s, is the most widely used logical database model in the computer industry
It
represents the database as a collection of tables (simple files) called relations.
Unit 5: Data Management
Wang,X., ENGO351
The relational database model (2)
(Aronoff, 1993)
Unit 5: Data Management
Wang,X., ENGO351
The relational database model (3) Each table in the database represents an entity type identified in the data modeling process
(Brown, 1997) Unit 5: Data Management
Wang,X., ENGO351
Column and tuple In a particular table, each column represents an attribute
each row, called a tuple (or record), represents a collection of data values associated to the occurrence of an entity
(Lo and Yeung, 2007)
Unit 5: Data Management
Wang,X., ENGO351
Relation
A relation can be simply thought of as a table of data
A relation is made up of a set of tuples There can be any number of tuples in a relation Within a relation, the logical order of the tuples is not important
Unit 5: Data Management
Wang,X., ENGO351
Domain A domain is a set of attribute values associated to each column of a relation
A domain is identified by an attribute name
There are occasions when the values of some attributes within a particular tuple are unknown or missing
a special value, called null (which is not zero), is assigned
Unit 5: Data Management
Wang,X., ENGO351
Primary and foreign keys (1) Tables, in a relational database, are connected to each other using keys A primary key represents one or more attributes (columns in the table) whose values can uniquely identify a record in a table Its counterpart in another table for the purpose of linkage is called a foreign key
http://www.datagovernance.com/
Unit 5: Data Management
Wang,X., ENGO351
Primary and foreign keys (2) Thus, a key common to two tables are used to establish connections between corresponding records in the tables
Unit 5: Data Management
Wang,X., ENGO351
Primary and foreign keys (3) Keys are a simple way to connect tables within a relational database
Collector
Unit 5: Data Management
Wang,X., ENGO351
Primary key Attribute values of the primary key allow s to identify individual tuples uniquely
e.g. SIN, an identification number
The primary key must not contain null values (Why?)
What if cannot use of a certain attribute alone in a relation to identify the unique identification of a tuple?
use one or more additional attributes to form a compound key create an additional attribute to hold a unique identifier for each occurrence of an entity
Unit 5: Data Management
Wang,X., ENGO351
Foreign key A foreign key is an attribute in a relation that is a primary key in another relation
The identical values of the primary and the foreign keys make it possible to logically link different tuples in different relations
Unit 5: Data Management
Wang,X., ENGO351
and Relate Two
common operations for linking tables in a relational database are and relate. A
operation brings together two tables by using a key that is common to both tables. A
relate operation temporarily connects two tables but keeps the tables physically separate.
35
Figure 8.10 (Chang, 2012) Primary key and foreign key provide the linkage to the table on the right to the feature attribute table on the left.
36
Figure 8.11 (Chang, 2012) This example of a many-to-one relationship in the SSURGO database relates three tree species in cotreestomng to the same soil component in component. 37
Figure 8.12 (Chang, 2012) This example of a one-to-many relationship in the SSURGO database relates one soil map unit in mapunit to two soil components in component.
38
Normalisation Normalising is the splitting of the database into multiple tables. Two main reasons for normalising a database: prevents unnecessary duplication of data, thus conserving time and disk space, and in some cases, preventing errors. makes it easier to extract exactly the information from the database.
Each normal form is built on the normal form before it. e.g. if a table is in 3-NF it is in 2_NF
Unit 5: Data Management
Wang,X., ENGO351
Normal Forms 1-NF: attributes are atomic or single valued. 2-NF: All non-primary attributes are fully dependent on the primary key; no non-primary attributes are functionally determined by a subset of the key 3-NF: No non-prime attributes are functionally determined by another non-prime attribute; all nonprimary attributes are directly dependent on the primary key. Example: A company runs many projects. Employees work on particular tasks on different projects. How do we invoice the time each employee allocates to their different tasks on each project?
Unit 5: Data Management
Wang,X., ENGO351
Guidelines for relational database design 1. Create E-R model 2. Produce table for each entity 3. For 1:1 relationships, add foreign key to either participating entities For 1:M relationships, add foreign key to entity on "Many" side of relationship 4. For N:M relationships, create a new table 5. Check design is "normalised"
Unit 5: Data Management
Wang,X., ENGO351
Advantages of the relational model
A relational database is simple and flexible
Each table in the database can be prepared, maintained, and edited separately from other tables
The tables can remain separate until a query or an analysis requires that attribute data from different tables be linked together
Unit 5: Data Management
Wang,X., ENGO351
Physical model
A physical data model describes the physical storage of the data in the computer by record formats and access paths
It is hardware dependent and is concerned primarily with the implementation details of a database
It is intended for the system programmer and database manager, not the general
(Lo and Yeung, 2007)
Unit 5: Data Management
Wang,X., ENGO351
(Lo and Yeung, 2007)
Unit 5: Data Management
Wang,X., ENGO351
SQL SQL (Structured Query Language) is a data query language designed for relational databases
SQL has been developed by IBM in the 1970s and many commercial database management systems such as Oracle, DB2, Access, and Microsoft SQL Server have since adopted the query language
This is an English-like language that consists of a set of powerful and flexible commands for the manipulation of the data in the relational tables
Unit 5: Data Management
Wang,X., ENGO351
Relational algebra The SQL language is made with a set of relational operators and commands that can be combined to query a database. These operators and commands form what is called relational algebra
Unit 5: Data Management
Wang,X., ENGO351
Relational operators (1)
With SQL, six logical operators can be used:
equal to
not equal to
smaller than
greater than
smaller or equal to
greater or equal to
Unit 5: Data Management
Wang,X., ENGO351
Relational operators (2) With SQL, three boolean operators can be used:
NOT
AND
OR
The shaded area represents:
the complement of A (NOT)
the union of A and B (OR)
the intersection of A and B (AND)
Unit 5: Data Management
(Chang, 2006)
Wang,X., ENGO351
SQL commands (1) SQL is used to create and query the database using a set of SQL commands. e.g.
the create command creates databases and tables
the select command selects rows that have been inserted into the tables
By using SQL commands, the needs only to specify the tables, columns, and row qualifiers to retrieve any data item in the entire database
the s do not need to know the technical details of how the
data are stored
Unit 5: Data Management
Wang,X., ENGO351
SQL commands (2)
SQL commands are used to perform two main functions:
to define the database structure (data definition)
e.g., CREATE
to
insert, modify, and retrieve data from the database (data manipulation)
e.g. SELECT
Unit 5: Data Management
Wang,X., ENGO351
CREATE command
To create a database:
CREATE DATABASE database_name
To create a table in a database: CREATE TABLE table_name (column_name1 data_type, column_name2 data_type, ....... )
CREATE TABLE ADDRESS_BOOK ( NAME CHAR (30) COMPANY CHAR (20) E-MAIL CHAR (25) ) NAME COMPANY
Unit 5: Data Management
ADDRESS_BOOK
E-MAIL
Wang,X., ENGO351
INSERT and DELETE
To insert new rows into a table:
INSERT INTO table_name VALUES (value1, value2,....)
INSERT INTO River (Name, Origin, Length) VALUES (‘Mississippi’, ‘USA’, 6000)
To delete rows in a table: DELETE FROM table_name WHERE column_name = some_value
DELETE FROM City WHERE Country = ‘Canada’;
NAME
COMPANY
E-MAIL
John Smith
Travelcity
[email protected]
INSERT INTO ADDRESS_BOOK (NAME, COMPANY, E-MAIL) VALUES (‘John Smith’, ‘Travelcity’, ‘
[email protected]’) DELETE FROM ADDRESS_BOOK WHERE COMPANY = ‘Travelcity’; Unit 5: Data Management
Wang,X., ENGO351
SELECT command The select command extracts data items in specified rows of a table. It returns a new table that has a subset of tuples of the original.
SELECT * FROM table_name
SELECT column_name(s) FROM table_name
SELECT column_name(s) FROM table_name(s) WHERE conditions
GROUP BY column_name
Unit 5: Data Management
Wang,X., ENGO351
World database data tables
Unit 5: Data Management
Wang,X., ENGO351
SELECT Example 1. • Simplest Query has SELECT and FROM clauses • Query: List all the cities and the country they belong to.
SELECT Name, Country FROM CITY
Result
Unit 5: Data Management
Wang,X., ENGO351
SELECT Example 2. • Commonly 3 clauses (SELECT, FROM, WHERE) are used •Query: List the names of the capital cities in the CITY table. SELECT Name FROM CITY WHERE CAPITAL=‘Y’ SELECT * FROM CITY WHERE CAPITAL=‘Y ’
Result
Unit 5: Data Management
Wang,X., ENGO351
SELECT Example 3 Query: List the attributes of countries in the Country relation where the life-expectancy is less than seventy years. SELECT Co.Name,Co.Life-Exp FROM Country Co WHERE Co.Life-Exp <70 Note: use of alias ‘Co’ for Table ‘Country’
Result
Unit 5: Data Management
Wang,X., ENGO351
Multi-table Query Examples Query: List the capital cities and populations of countries whose GDP exceeds one trillion dollars. Note:Tables City and Country are ed by matching “City.Country = Country.Name”.
SELECT Ci.Name,Co.Pop FROM City Ci,Country Co WHERE Ci.Country =Co.Name AND Co.GDP >1000.0
AND Ci.Capital=‘Y ’
Unit 5: Data Management
Wang,X., ENGO351
Multi-table Query Example Query: What is the name and population of the capital city in the country where the St. Lawrence River originates?
SELECT Ci.Name, Ci.Pop FROM City Ci, Country Co, River R WHERE R.Origin =Co.Name AND Co.Name =Ci.Country AND R.Name =‘St.Lawrence ’ AND Ci.Capital=‘Y ’
Unit 5: Data Management
Wang,X., ENGO351