Open Grow Close
Evolve
Openness, Growth, Evolution, Openness, Growth, Evolution,
and Closure in Archival and Closure in Archival
Information Systems Information Systems
Lessons from NARA’s Experience
September 2008
Kenneth Thibodeau, Director
Electronic Records Archives Program
National Archives and Records Administration
IEEE Symposium on Mass Storage Systems & Technologies
Keys to the Digital Future
Open Grow Close
Evolve
Archival Information System
³Conceptually: “an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a
Designated Community.”
• ISO Reference Model for an Open Archival Information System (OAIS). ISO 14721:2003
³Empirically: the National Archives’ Open Archival Information System, the
Electronic Records Archives
Open Grow Close Evolve
What is the Electronic
Records Archives (ERA)?
³ERA is the system the National Archives and Records Administration (NARA) is developing to
²Reengineer and automate the lifecycle management of all types of records of the U.S. Government
²Preserve and provide sustained access to
electronic records of the U.S. Government
Open Grow Close
Evolve
ERA Development Timeline
Full Operating Full Operating
Capability Capability
ERA Base System
ERA Search & Access System
Enhancement
Initial Operating Capability (IOC) 6/08
Operation & Maintenance
9/05 9/06 9/07 9/08 9/09 9/10 9/11
Enhancement Enhancement
Enhancement
Open Grow Close Evolve
ERA Base System Development
³Focus:
²Federal Records
²National Archives
³IOC Functions (2008) :
²Creation, review and approval of records schedules
²Requests to transfer records, transfer of physical and legal custody
²Transfer, inspection, and archival storage of
electronic records
Open Grow Close
Evolve
Initial Users
Open Grow Close Evolve
ERA Search and Access System Development
³Initial Focus:
²Electronic records of the Executive Office of the President, G W. Bush
²Presidential Libraries
² 100 TB
³Functions:
²Rapid ingest & indexing
• Transformation to more accessible form.
²Archival storage
²Full content search
²Basic case management for special requests
Open Grow Close
Evolve
Future Development Future Development
³Public Access to
²Any information about records
• Ordering of copies of records
²Electronic records stored in the system
³Long-term preservation of electronic records
²Ability to use a variety of techniques simultaneously and over time
³Review and redaction of sensitive content
³Support for Federal Records Centers
³Exponential growth in stored data
Keys to the Digital Keys to the Digital
Future Future
³ ³ Openness Openness
³ ³ Growth Growth
³ ³ Evolution Evolution
³ ³ Closure Closure
Lessons from the ERA experience
Open Grow Close Evolve
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
Open
²New Types of Records:
Geographic Information Systems
Open
²New Types of Records:
Product Data
Computer Assisted Computer Assisted
Design Engineering
Product Computer
Analysis and Testing Assisted Manufacture
Open
²New Types of Records:
Critical Infrastructure Data
Source: CLindberg http://commons.wikimedia.org/wiki/Image:I35_Bridge_Collapse_4crop.jpg
Open
²New Types of Records:
Virtual Reality: Crime Scene Investigation
Open
²New Types of Records:
Medical Tests and Observations
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
² Rising and Changing User Expectations
Open
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
Open Grow Close
Evolve
Openness Openness
³An Archival Information System needs to be open to
²New types of electronic records
²Rising and changing user expectations
²Creative approaches to meeting the
challenges of electronic records and
demanding users.
Open
Creative Approaches
³The conceptual apparatus we bring to bear on
– The nature of records
– Requirements for preserving records
– Requirements for serving users
Open
² Creative approaches: Partnerships
National Science
Foundation
NIST
San Diego Supercomputer
Center National Computational
Science Alliance Global
Grid Forum
Army Research Laboratory
Keys to the Digital Keys to the Digital
Future Future
³Openness
³ ³ Growth Growth
³Evolution
³Closure
Open Grow Close Evolve
Open Grow Close
Evolve
Growth Growth
³An Archival Information System needs to be able to grow to
² Process, store and provide access to
increasing volumes of electronic records
² Accommodate increasing numbers of users
and frequency of use
Grow
²Increasing Volumes of Digital Information
• In 2006, the amount of digital information created, captured, and replicated was …281 exabytes or 281 billion gigabytes. This is about 3 million times the
information in all the books ever written.
• By 2011, the digital universe will be 10 times the size it was in 2006.
• Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home.
• The number of electronic information “containers” — files, images, packets, tag contents — is growing 50%
faster than the number of gigabytes. The information created in 2011 will be contained in more than 20 quadrillion — 20 million billion — of such containers
– IDC. The Diverse and Exploding Digital Universe. An Updated
Forecast of Worldwide Information Growth Through 2011. March 2008
Grow
Transfers of Digital Files to NARA
16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000
0 1970-1988 1989-1995
Grow
Transfers of Digital Files to NARA
250,000
200,000
150,000
100,000
50,000
0 1970-1988 1989-1995 Reagan/ Bush
Grow
Transfers of Digital Files to NARA
0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000
1970-1988 1989-1995 Reagan/ Bush Clinton
Grow
² Planning for Open-ended Growth
DREN Internet
NARANET
Public
SAS ERA
Future ERA Instance
System
Management
Other Agencies
Instance Instance
Base ERA
Federation
Keys to the Digital Keys to the Digital
Future Future
³Openness
³Growth
³ ³ Evolution Evolution
³Closure
Open Grow Close Evolve
Open Grow Close
Evolve
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
²Changing Information Technology
• Obsolescence
• Opportunities
²Changing business requirements
Open Grow Close
Evolve
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
²Changing Information Technology
• Obsolescence
• Opportunities
²Changing business requirements
Evolve
²Obsolescence of Formats of Electronic Records
• Strategy: Preservation and Access Levels
– Common:
• Retain records in original formats
– Basic Level:
• Use original or contemporary software for access
– Enhanced Level
• Create new version in current format, or
• Use new software for access to original format
– Ideal Level
• Create version in persistent format, or
• Create persistent software for management and access
Evolve
²Obsolescence of Formats of Electronic Records
• ERA System Architecture:
– Does not prescribe specific preservation solutions – Allows a variety of different software tools to be
introduced and used for different formats.
– Enforces archival requirements
Preservation Framework
Record:
Correspondence
Attached Photograph
Proprietary to standard image format converter
Standard form Photograph E-mail to
structured data converter
E-mail Header
E-mail Message
Example
Open Grow Close
Evolve
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
²Changing Information Technology
• Obsolescence
• Opportunities
²Changing business requirements
Open Grow Close
Evolve
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
²Changing Information Technology
• Obsolescence
• Opportunities
²Changing business requirements
Evolve
Preservation Options
SpecificGeneral
Maintain original technology
Object Interchange
Format Persistent
Object
Version Migration
Format Standardization
Typed Object Conversion Rosetta Stone
Translation
Viewer Re-engineer
Software Universal
Virtual Computer
Programmable Chips Emulation
Static Translator
Dynamic Translator
Virtual Machine
Multi- Valent Document
Preserve Method Preserve
Technology Objects
Evolve
² Changing Information Technology:
Service Oriented Architecture
Archive Storage VLAN System/Business Application VLAN External Access
Gateway (DMZ Access)
ERA Management VLAN
Ingest VLAN WebServer VLAN
Ingest Working Storage
Archive Disk Storage (Managed Storage)
To Greenbelt ERA Mgt
Firewall + IDS HW Load Balancer
Ingest Server (App Server)
Unix Input Workstation Physical Ingest Ethernet Switch
Windows Input Workstation Printer
DVD Tape Drive CD ROM
Printer
DVD Tape Drive CD ROM HW Load
Balancer Web Server
Ethernet Switch
Archive Label Printing WS
Printer
Archive Tape Storage
Mgt Router (w/IPSec)
Accountability Server
Accountability WS
Backup Restore Server
Enterprise System Mgt
Server
Help Desk Server
Network Mgt Server
Security Server (Code Modification)
Security Server (Vulnerability Scan)
Software Deployment
Server Central Data
Manager Server
Data Services Server (Mgmt DB Server) ERA
Extranet
GOVT Agencies NARA
Core Switch (w/FW + IDS + LB) Firewall + IDS
Firewall + IDS
Firewall + IDS Firewall + IDS
Ethernet Switch Database
Server (T&I)
SOC
SOC WS Ethernet
Switch
Firewall + IDS
Ethernet Switch Firewall
+ IDS Documentum
Server
Strong Auth Server DNS/
Directory Server Database
Server (SBA)
Asset Mgnt WS App
Server (SBA)
Instance Datastore
FC Switch
Mgmt LAN, Storage Network
Backup Network Ops LAN, Management LAN,
Storage Network
Ops LAN, Management LAN,
Storage Network Ops LAN,
Management LAN, Storage Network Ops LAN,
Management LAN
Extranet Router (w/FW + IDS + IPSec) Firewall + IDS
Firewall + IDS
Firewall + IDS Firewall + IDS
Derived from:
ERA Hardware Block Diagram -2007 0823 (Tab: I1R2 U/USBU Detailed Block) Updated 24 Aug 2007
1Gb Ethernet 2/4Gb Fibre Channel
Evolve
²Service Oriented Architecture
As Built
EOP ERA System/Business Application
Enclave
DMZ Enclave Management Enclave
Ingest Working Storage
Instance Data store
To Greenbelt WAN
HW Load Balancer
DVD Tape Library
CD ROM HW Load Balancer
FC Switches
Mgt Router (w/IPSec)
Firewall + IDS
Log Logic Applicance Storage Mgt Server
Backup Restore Master Server
Enterprise System Mgt
Server Help Desk Server
Network Mgt Server Security Server
(Code Modification) Security Server
(Vulnerability Scan) Software Deployment
Server Extranet Router
(w/FW + IDS + IPSec) Firewall + IDS
Firewall + IDS
Firewall + IDS Firewall + IDS
HCAP Storage Nodes
Mgmt DB Server
ERA Extranet
GOVT Agencies NARA
Web/
Application Server
Base ERA Web Server Ethernet
Switch 1
Database Servers
HCAP Search Nodes
Primary Archive Storage Base ERA
Ingest DB Server
Ethernet Switches
Backup/ Restore Media Server ECM
Server
Tape Library
Ingest Workstation Enclave (Base ERA)
Firewall + IDS
Physical Ingest Workstations Physical
Ingest Printers
SOC Enclave Operational Services
Enclave
DNS Server Active Directory
Server LDAP and
Radius Server
Strong Authentication
Server
SOC Workstation Firewall + IDS
USP/V
Base ERA
Managed Storage Label
Printer
Central Data Management Servers Ethernet
Switch 2 Ethernet
Switch 1
Base ERA Ingest Server
EOP Ingest Workstations
EOP Ingest Servers
Ethernet Switch 1
Ingest Security Review WS
Tape Subsystem
Archival Storage Enclave
Fiber Channel Switches Core Switch Backbone
(w/FW + IDS + LB) 1st Floor EOP, 2nd Floor Base Firewall + IDS
Firewall + IDS
Firewall + IDS Firewall + IDS
Ethernet Switch
Separate Dedicated Physical Disks for Base ERA & ERA EOP
Application Servers
Database Server
Separate Dedicated Physical Disks for Base ERA & ERA EOP
Backup/Restore Media Servers may be separate & dedicated between (Base ERA & EOP ERA)
Color = Unchanged from Base
Infrastructure Appl Server
Management LAN Management LAN
Management LAN
Firewall + IDS
Ethernet Switch 2
Network Compliance Mgt
Server Ethernet Switch
Replica Archive Storage Asset Mgt
Workstation
Archive Label Printing Workstation
Color = Changed from Base Color = New for EOP
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
– Changing Information Technology
• Obsolescence
• Opportunities
– Changing business requirements
Evolution Evolution
³An Archival Information System needs to be able to evolve in
response to
– Changing Information Technology
• Obsolescence
• Opportunities
– Changing business requirements
² Evolution of Business
Evolve
Requirements
Technical Requirements Solutions
Evolve
Records Schedule: Current
Evolve
Create Schedule Item
Temporary Records Permanent Records
Keys to the Digital Keys to the Digital
Future Future
³Openness
³Growth
³Evolution
³ ³ Closure Closure
Open Grow Close Evolve
Open Grow Close
Evolve
Closure Closure
³An Archival Information System
needs to be able to provide closure to ensure
²Preservation and presentation of authentic records
²Comprehensive lifecycle management of electronic records
²Consistency with well-established
archival science
Close
ERA: a Set of Nested Systems
• Outer system
lifecycle management of records of all types
• Inner Electronic Records System
Ingest, preservation, disposition, and access to electronic records
• Search & Preservation Frameworks
Support a variety of different approaches to different needs
• Archival “mini-systems”
Specific, systematic management for each series or aggregate of electronic records
Close
Document v. Record
• A document is a bounded physical representation of a body of information designed with the capacity (and usually intent) to communicate. A
document may manifest symbolic, diagrammatic or sensory-representational information. ...
– en.wikipedia.org/wiki/Docume nt
• The information communicated by a document depends on its content and structure.
• A record is a document made or received in the course of a practical activity as an
instrument or a by-product of such activity, and set aside for action or reference.
– http://www.interpares.org/ip2/i p2_terminology_db.cfm
• The information communicated by a record depends on its
content, structure, and context.
Close
Document
What does this document tell us?
Record Record
Close
What does this document tell us about the U.S. Government?
Close
Archival Aggregate as Directed Graph
Every record has an ‘archival bond,’ the set of
relationships established by an actor between that record and other records of the actor’s activity.
Close
Preservation
• Documents can be preserved as individual objects
• Records can only be preserved as ordered sets.
ĺAn Archival Information System for records must ensure that
• Submission Information Packages,
• Archival Information Packages and
• Dissemination Information Packages
are managed to respect the original order of records.
Electronic Records
Close
May be instantiated as subsets
Open
of complex ordered sets
Close
ERA as a Set of Mini-Systems
A Lifecycle Management Plan for a Records Aggregate, such as a series, defines a “Mini-system;” i.e., systematic controls for that aggregate stretching from ingest to dissemination.
Open Grow Close
Evolve
SUMMARY
³Openness
²New types of electronic records
²Rising and changing user expectations
²Creative approaches
³Growth
²Exponential increase in volumes of data
²Increasing users and usage
³Evolution
²Changing Information Technology
²Changing business requirements
³Closure
²Preservation of electronic records as members of ordered sets.
Open Grow Close Evolve