5 Principles of Database Systems

Concepts in Database Systems

  • Database Modeling
  • Data Management
  • Integrity Rules
  • Backup and Recovery
  • Data Security

Database management systems govern an extraordinary amount of everyday life. A database system houses healthcare and banking data, marketing and demographic data, and a vast number of other types of information that inform our day to day lives, both personal and professional.

As a result, the creation and maintenance of database systems have become an enormous demand – and more and more people are studying data science and database management to take advantage of the hundreds of thousands of jobs available in this arena. According to the Bureau of Labor Statistics, the field of data science and database administration is growing faster than ever.

For data science students who want to specialize in database management systems, a number of unifying concepts and areas of study are introduced in scholastic programs. Here are five database management principles.

1. Database Modeling

Database modeling determines the structure of a database and methods of storage and fetching data housed within it. Database models also determine how information is organized. There are a number of different database models used in contemporary data science for different purposes. Some of the most commonly used include:

Relational Database Model: Relational database systems are the most widely used model.  They organize data into tables using rows and columns.  Each row represented a record while each column represents an attribute of the record.  Relationships between tables are then established using keys.  Popular relational database systems include MySQL, an open soured relational database management system DBMS, and Oracle.

Hierarchical Database Model:  Data in a hierarchical database model is organized in a tree-like structure with parent/child relationships.  Each parent can have multiple child records, but each child can only belong to one parent.  Most hierarchical models have been replaced with relational databases.

Object Oriented Database Model:  The object-oriented model is an extension of the relational model to support object-oriented programming concepts.  It stores data as objects and is well suited for applications that require complex data structures.

Document Database Model: Also known as NoSQL databases, these store data in a semi-structured format using JSON or BSON documents.  They are used to handle unstructured or rapidly changing data.  Examples include Couchbase and Mongo DB.

Graph Database Model:  Graph database models represent and store data as graphs.  They consist of nodes and edges.  Graph database models are great for complex relationships and graph-like data structures.  One of the most popular graph database models is Neo4j.

Time-Series Database Model:  Time-series database models are used for storing and managing time-stamped data like financial market data or log files.  They allow the user to query and analyze time-based data patterns.

Columnar Database Model: Unlike other models in which data is stored in row, columnar database models store data in columns.  It is well suited for big data analytics and data warehousing.  Popular models include Google Bigtable and Apache Cassandra.

2. Data Management

The administrative end of database systems mostly comprises data management. Methods of data management indicate how to:

  • acquire and store data
  • process it effectively
  • validate it
  • ensure its security

Data management umbrellas a large array of topics across many disciplines.  Some of the key principles of data management include:

Data Governance: Data governance is used to establish clear roles and processes for data management within an organization.  Organizations utilize data governance to ensure their data is managed in a consistent and accountable manner.

Data Quality: It is important that data is high quality.  Data should be accurate, complete and consistent.  Data scientists ensure data quality through processes like data validation and cleaning to identify and correct errors.

Data Integration: Data integration involves taking data from various sources to create a unified, comprehensive view.  It helps facilitate reporting and decision making.

Data Documentation and Metadata Management: It is important to document data definitions, sources and transformations to ensure the data is understandable and traceable.  This allows for effective data search and discovery.

Data Retention and Archiving: Data storage policies are usually determined by legal and regulatory requirements.  Businesses need to determine how long to archive data since it could be needed for historical or compliance purposes.

Data Monitoring and Auditing: Employees and stakeholders complete data monitoring and auditing training to improve data literacy.  Monitoring and auditing efforts foster a data-driven culture.

3. Integrity Rules

The practice of establishing integrity rules informs database management system managers on how to securely store, process, and retrieve data while simultaneously ensuring its accuracy and format over the lifetime of its storage and usage. Both the physical and logical integrity of the data is vital, which means database managers who establish integrity rules must have ample competence with both logical and physical data structures.  There are four main types of integrity rules:

Entity Integrity: In entity integrity, each row or record in a table is uniquely identifiable.  The primary key column cannot have null or duplicate values.

Referential Integrity: Referential integrity refers to how relationships between tables are maintained.  These rules ensure that foreign key values in one table match the primary key values in another table or they must be null.  This helps prevent orphaned or invalid references between different tables.

Domain Integrity: Domain integrity enforces valid data values within each column or attribute of a table.  It ensures that data that is entered into each field adhere to defined:

  • data type
  • format
  • range

Data fields should only contain valid data values while numeric fields should only contain numbers.

User-Defined Integrity: These rules allow database administrators (DBAs) to define additional constraints.  These rules align with business logic and application requirements.  They are used to enforce complex data validation.

4. Backup and Recovery

Data backup and recovery is part and parcel to data management. Its necessity means that challenges in data security and storage are presented, as the backup must carry the same integrity as the original datasets.  Having secure data in more than one place presents an additional vulnerability. Backup and recovery specialists must have high levels of skill in:

  • logical and physical data structures
  • data integrity and security,
  • data privacy and access methods

There are different types of backup methods.  These include:

Full Backup: A full backup is a copy of the entire database to a backup storage medium.  This provides a complete snapshot at a specific point in time.

Incremental Backup:  This type of backup only copies the data that has changed since the last backup.  This method reduces backup time and storage requirements.

Differential Backup: Differential backup is a copy of all the changes made since the last full backup.

Snapshot backup: Some database management systems support snapshot backups.  These are a point-in-time image of the database without affecting ongoing operations.  It can be done during active transactions.

There are several types of recovery that an organization can do to protect their information.  These include:

Restoration: Restoration is typically done after a data loss event or complete system failure.  The data is copied from the backup storage and placed in the original or replacement database.

Point-in-Time Recovery (PITR): PITR allows a database to be restored to a specific time in the past.  It uses backups and transaction logs to apply these changes.

Rollback: A rollback is performed to undo a recent transaction that caused an error or inconsistency in the database.  It reverts the database to the way it was before the erroneous transaction was executed.

5. Data Security

Last but certainly not least, data security is of the utmost importance in data systems. Data security analysts must work across a broad array of tools and best practices to secure stored data, including limiting access to stored data with front-end interfaces, flow monitoring, storage monitoring, and much more. Because database security is of such high importance, many jobs in database management that involve cybersecurity are very highly paid.  Some essential aspects of database systems security include:

Access Control:  It is important to implement strong access controls so only authorized users can access the database.  Authentication mechanisms like usernames and passwords or biometrics can be used to help verify user identities.

Encryption: Encryption techniques can protect data both in transit and at rest.  Encryption ensures that the data remains unreadable if it is intercepted or stolen.

Patch Management: DBAs can keep their systems and software up to date with the latest security patches.  Conducting security updates to address known vulnerabilities and reduce the risk of exploitation.

Secure Coding Practices:  Secure coding practices can be used to develop and deploy database applications. This can prevent common security vulnerabilities including structured query language (SQL) injection and cross-site scripting attacks.

Conclusion

These database principles, among others, each embody different specializations and different areas of skill, and each represents different challenges. No matter the adeptness and interest of the student, each of these principles of database management are worth exploring for specialization in professional practice.

Related Resources:

Scroll to Top