The process of migrating data consists of steps converting the relational tables to the CouchDB structure defined below.
Determine the CouchDB document structure i.e. fields required in the document Extract and examine the MySQL tables structure i.e. determine existing fields in the
76
Define the document structure based on the above steps i.e. define the fields required in a particular CouchDB document from one or more MySQL database tables
Create CSV file containing data from a source table using a script
Embed the final structure of CouchDB document in the PHP code block. This code is read during the migration operation and a document is created in the CouchDB according to the defined structure.
Create document and import data in CouchDB using CVS file and CouchDB libraries Figure 6-2 below shows a simple partial code example of embedded structure in PHP.
Figure 6 - 2: PHP code with embedded structure of the CouchDB document
The process is currently manual but can be automated in future. The definitions of the objects are extracted by looking at each object individually and documenting it. These definitions are
77
obtained either by using a reverse engineering or by just defining the MySQL table inside the database. The process starts with creating a CSV file with data from the source table.
The process of converting the RDBMS table starts after the table definition is collected from the RDBMS. Data in the RDBMS is manipulated using the Structured Query Language (SQL) through customized client software or a module such as MySQL Workbench. The CouchDB allows use of the RESTful HTTP API to communicate with the database and provides the user with access to the information. Representational State Transfer (REST) is an architecture style that makes it possible to access data using web services that are implemented in the Hypertext Transfer Protocol (HTTP). REST was introduced by Roy Fielding in 2000. Everything in a REST base architecture is a resource and these resources are accessed by common interface based HTTP standard methods GET, POST, PUT, and DELETE. A resource can be anything e.g. a document residing on a server in a file system or a table row in a database and these resources are accessed by Uniform Resource Identifier (URI). In the REST architecture the client can access and modify the resource on a server. The server provides and controls the access to the resource. REST supports all common HTTP operations and allows the resources to have different representational styles such as XML, JSON, and HTML etc. A client access the representation by using the URI.
The Extensible Markup Language (XML), HyperText Markup Language (HTML), and JavaScript Object Notation (JSON) are popular formats of data for the RESTful applications. The method in this research uses a JSON representation embedded in a PHP code block which interacts with the CouchDB using the CouchDB libraries for PHP. The structure of the documents is defined in the PHP code block and data is read from the CSV file. Using the CouchDB API, an HTTP request is issued to the CouchDB server using GET, POST, PUT, or
78
DELETE methods. The data is passed along with the request method and a new document is created in the CouchDB. This method provides precise control over the unique identifier of the new document. It allows assigning a meaningful and unique identifier to the CouchDB documents. The unique identifier _id is automatically assigned to a new document, if not defined explicitly.
This utility is flexible and provides the facility to provide a hardcoded value for each key entered through a form. Another method is to read an existing file with exported data using the API form. This operation can potentially be automated in future research.
Documents in CouchDB are created with a key-value structure. The keys in this case are the name for the table’s fields and values are the data in the table. The uniqueness of each document is maintained by giving it a meaningful name.
The figure 6-3 below explains how the RDBMS table converts into the CouchDB document. This also shows how table fields are mapped to counterpart keys in a document system and how the same data is assigned to each key as a value. This explanation uses an actual table of the CS system that is converted into a CouchDB document with identical data.
A table ‘USERS’ with all users’ information is populated with 3 rows. This table generates three documents in the CouchDB when converted. Each row in the table is mapped with exactly one document. The fields’ name and data are identical. Each document name is unique as explained earlier. The NoSQL DBs have no concept of Primary Key so the PK column is not required in the documents. Each newly created document is assigned a uniquely identified document through the ‘_id’ key. The name of the document identifies it and could also be considered equivalent to a table name.
79
Figure 6 - 3: Mapping RDBMS table with CouchDB document
The first scenario described in Figure 6-3 above is good if the data is being imported from one MySQL table but the CouchDB model is semi-structured and also de-normalized as compared to the MySQL tables. It is common that due to the de-normalized nature of the data structure, the CouchDB documents contain fields for which the data may be coming from multiple tables. When this situation is encountered then the desired field from the MySQL tables should be mapped to the corresponding fields in the CouchDB documents. The rest of the process is identical as if we import the data from a single table. To use this approach the programmer has to be aware of the CouchDB data model and document structure. Consider a document in the CouchDB that is required to contain information about the review process of a submitted document. The CouchDB document structure contains information as shown in figure 6-4 below.
80
Figure 6 - 4: Structure of a CouchDB document
The record that populates the above mentioned CouchDB document is shown in Figure 6-4 in parts for clarity below.
81
review_end_time review_start_time review_status reviewed_document_dir 2012-12-23 13:11:47 2013:11:18 12:54:57 done 2390
Reviewed_document_id Reviewed_document_name Reviewed_document_url Reviewer_name
8 Using CouchDB https://ashikpc.usask.ca.. Lynn Lee
Source_doc_author Source_doc_id Source_doc_name Source_doc_url
Tim May 1 Using CouchDB https://ashikpc.usask.ca.
Table 6 - 1: A record in MySQL
82
Figure 6 - 5: MySQL record as CouchDB document after data migration
The information in this document is coming from multiple MySQL tables including USERS, DOCUMENTS, REVIEWED_DOCUMENTS, and REVIEW_STATUS. The structures of these tables are represented in figures 6-6, 6-7, 6-8, and 6-9 respectively. To import data in this scenario, proper mapping of table fields with the document fields is essential. Each row in the tables creates one new document in CouchDB with the fields as controlled in the coding.
83
Figure 6 - 6: Structure of USERS table in MySQL
Figure 6 - 7: Structure of DOCUMENTS table in MySQL
84
Figure 6 - 9: Structure of REVIEW_STATUS table in MySQL
Figure 6-10 below represents how the MySQL table fields are mapped to the CouchDB document fields during the process of import.
85
86 CHAPTER 7 EVALUATION
The main focus of this research is migrating data successfully without any data loss or data corruption and then to use the migrated data to perform identical operations and get identical results. The goal G1 is achieved successfully if the data has been moved from the MySQL to the CouchDB successfully without any data loss or corruption. The goal G2 merely depends on the success of the goal G1 and is achieved successfully if the same operations i.e. SELECT, INSERT, UPDATE, and DELTE that were performed in MySQL DB produce the same results with the migrated data. There is no direct dependency of the goals G3 and G4 on goals G1 and G2 but the success of the first two goals confirms that the code written to achieve the goals G1 and G2 is correct. Once it is confirmed that the code is correct then we can use it to achieve the goal G4 by comparing the codes written for both the MySQL and the CouchDB to perform the same operations. The goal G3 is totally independent of the other goals and depends on other factors too.