Conceptual data model - describes how data elements in the system are to be grouped
Entity - thing of interest about which data is to be held
Attribute - property or characteristic of an entity
Relationship - link or association between entities
Entity-relationship diagram - diagrammatic way of representing the relationship between the entities in a database
Problems with the traditional file approach:-
- Data redundancy - same data duplicated in many different files
- Data inconsistency - when the same items of data are held in several different files, the data should be updated in each file when it changes (if not -> data inconsistency)
- Program-data dependence - file format (i.e. which data fields constitute a record) must be specified in each program. Changes to the format of the data fields mean that every file which uses that program has to be changed
- Lack of flexibility - for non-routine data it could take weeks to assemble data from various files and write new programs to produce the required reports
- Non-sharable data - if two departments need the same data, either a second copy of the data would be made (-> data inconsistency) or the same file used (adding extra fields would mean programs would need to be changed to reflect the new file structure)
Database - collection of non-redundant data sharable between different applications
- all data is centralised and accessible by all applications
Problems with the database approach:-
- Unproductive maintenance - programs still dependent on the structure of the data (when one department needs to add a new field to a file, all programs accessing that file need to be changed)
- Lack of security - all data in the database is accessible by all applications
Database Management System (DBMS)
- layer of software between the applications and the data to improve security and eliminate unproductive maintenance
- the storage structure of the data is hidden from each application/user (program data independence)
- each user is given a limited view of the data according to need
Relational database
- most common type of DBMS
- data held in tables (relations) and tables are linked by means of common fields
Foreign key - primary key from one table used as foreign key in a second table (common field)
Normalisation
- process used to come up with the best possible design for a relational database
- no data is unnecessarily duplicated
- data is consistent throughout the database
- the structure of each table is flexible such that as few or as many items as are required may be entered
- the structure enables users to make complex queries relating data from different tables
First normal form - no repeating attributes or groups of attributes
Second normal form - in first normal form and no attribute that is not part of the primary key is dependent on only a portion of the primary key
Third normal form - no non-key dependencies (Boyce-Codd Normal Form - BCNF)
- a relational database can create links between tables representing different entities through the use of foreign keys
- it is impossible to normalise a database in a flat file system (since links between tables are not possible)
- flat file systems do not allow individual access rights to be set or multi-user access
SQL (Structured Query Language) - data access language which deals with the manipulation of tables and data
DML (Data Manipulation Language) - the part of SQL concerned with asking questions of a database
Schema - different levels or views from which a database may be considered
- External/user schema - individual's view of the database
- Conceptual/logical schema - overall view of the entire database, including entities, attributes and relationships, as designed by the database designer
- Internal/storage schema - how data will be stored (concerned with file organisation and access methods)
DDL (Data Definition Language)
- language used to define the logical structure and the files within the database
- alternative way of creating tables, forms, reports
- e.g. CREATE TABLE Friends
([FriendID] integer,
[LastName] text,
[Notes] memo,
CONSTRAINT[Index] PRIMARY KEY[FriendID]);
DML (Data Manipulation Language) - provides comprehensive set of commands to allow modification of the data within a database
- enables users to execute queries
- allows advanced users to write programs to carry out sophisticated processing of the database
- e.g. INSERT INTO Employees(FirstName, LastName, Title).
VALUES ('Harry', 'Washington', 'Trainee')
Open systems - provide a standard by which applications may be written to allow portability to multiple systems
OBDC (Open Database Connectivity)
- provides a means of accessing data held in one type of database from another database, spreadsheet etc.
- ODBC driver translates ODBC requests to the correct format for a particular data source
Database Mnagement System (DBMS)
- application program that provides an interface between the OS and the user in order to make access of the data as simple as possible
- allows users to store, retrieve and update information without having to be aware of the structure of the database
- creates and maintains the data dictionary
- ensures that problems do not arise when two people simultaneously try to access and update a record
- ability to recover database in event of system failure
- handles password allocation and access rights
Data dictionary - a database about the database
- what tables and columns are included in the present structure
- characteristics of each item of data (e.g. length, data type)
- restrictions on the values of certain columns
- relationships between items of data
- which program access which items of data (and whether read or write access)
Multi-access database - more than one person can view and update the same tables concurrently
Ensuring integrity - ensuring no data is lost or corrupted (making sure updates are not lost)
- opening entire database in exclusive mode - prohibits all simultaneous access (software does not have to check for potential conflicts)
- lock all records in the table being modified - (other users could only open the table in read-only)
- lock the record currently being edited
- no locks - the software makes the users aware of the situation if a record is being simultaneously updated from two or more workstations, and the users must resolve the problem
- open table in read-only mode - (if you don't need to update the table)
Deadlock (Deadly embrace)
- if two users are attempting to update two related records in the same table, this can lead to a situation where neither can proceed
- the DBMS must recognise when this situation has occurred and one of the two tasks must be aborted to allow the other to proceed
Software protection techniques
- allocate users to named groups and assign different permissions/privileges to each group
- each user will have password which they should change regularly
- the database me be encrypted
Client-server database
- DBMS server software runs on the network server and DBMS client software runs on individual workstations
- server software processes requests for data searches, sorts and reports that originate from individual workstations (the entire database does not need to be copied to each workstation)
- expensive resource (powerful computer and large database) can be made available to a large number of users
- client stations can update the database rather than just view that data
- consistency of database is maintained (only one copy of data is held)
- communication time between the client and server is minimised because only the query results are sent back to the client station (not the whole database)
Object-oriented database - stores the data and methods as objects that can be automatically retrieved and shared (more suited to handling graphics based or multimedia applications)