7. Sequoia controller

7.1. Design Overview

The Sequoia controller is made of several components as shown in Figure 2, “Sequoia controller design overview”. The controller hosts virtual databases. A virtual database gives the illusion of a single database to the user. It exports the same database name and login/password as those used in the client application. Therefore the client application can run unmodified with Sequoia.

When the client application connects to the database using an URL like jdbc:sequoia://host:25322/myDB, the Sequoia driver tries to connect to a Sequoia controller running on port 25322 on node host. Once the connection is established the login and password are sent with the myDB database name to be checked by the controller.

A virtual database contains the following components:

  • authentication manager: it matches the virtual database login/password (provided by the application to the Sequoia driver) with the real login/password to use on each backend. The authentication manager is only involved at connection establishment time.

  • backup manager: manages a list of generic or database specific Backupers that are in charge of performing database dump and restore operation. Backupers should also take care of transferring dumps from one controller to another.

  • request manager: it handles the requests coming from a connection with a Sequoia driver. It is composed of several components:

    • scheduler: it is responsible for scheduling the requests. Each RAIDb level has its own scheduler.

    • request caches: these are optional components that can cache query parsing, the result set and result metadata of queries.

    • load balancer: it balances the load on the underlying backends according to the chosen RAIDb level configuration.

    • recovery log: it handles checkpoints and allows backends to dynamically recover from a failure or to be dynamically added to a running cluster.

  • database backend: it represents the real database backend running the RDBMS engine. A connection manager mainly provides connection pooling on top of the database JDBC native driver.

Figure 2. Sequoia controller design overview

Sequoia controller design overview

Each virtual database and its components are configured using an XML configuration file that is sent from the administration console to the Sequoia controller.

[Note]Note

A research report details RAIDb and C-JDBC implementation. Other documents and presentations about C-JDBC can be found in the documentation section of the web site.

7.2. Starting the Controller

The bin directory of the Sequoia distribution contains the scripts to start the controller. Unix users must start the controller with controller.sh whereas Windows users will use controller.bat.

Sequoia Controller startup is tuned via a configuration file, called controller.xml, included under the config/controller directory of your Sequoia installation. A simple configuration file looks like this:

A standard Sequoia Controller configuration file looks like this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE SEQUOIA-CONTROLLER PUBLIC "-//Continuent//DTD SEQUOIA-CONTROLLER 2.10//EN"  "http://sequoia.continuent.org/dtds/sequoia-controller-2.10.dtd">
<SEQUOIA-CONTROLLER>
	<Controller port="25322">
    <Report hideSensitiveData="true" generateOnFatal="true"/>
    <JmxSettings>
      <RmiJmxAdaptor/>
    </JmxSettings>
  </Controller>
</SEQUOIA-CONTROLLER>
      

You can specify at startup a different file than config/controller/controller.xml. This is useful if you have to startup many identical controllers from the network. You can then use the command controller.sh -f filename on Unix machines or controller.bat -f filename on windows.

For more information you can refer to the controller-configuration.xml example in the example directory of sequoia.

Next section describes how to write a controller configuration file.

7.3. Writing the controller configuration file

The controller is entirely configurable via an xml file, by default it is controller.xml located in the config/controller of the Sequoia installation. This section details how to write such a file.

7.3.1. Controller Parameters

The root element of the controller configuration is defined as follows

<!ELEMENT Controller (Internationalization?, Report?, JmxSettings?, 
                                    VirtualDatabase*, SecuritySettings?)>
<!ATTLIST Controller
  port             CDATA "25322"
  ipAddress        CDATA "127.0.0.1"
  backlogSize      CDATA "10"
>
    

All sub-elements of Controller are defined in the next sections. Here is a brief overview of each of them:

  • Internationalization: defines the language setting for Sequoia console and error messages.

  • Report: if this option is enabled, Sequoia can automatically generate a report on fatal errors or shutdown. If you experience any problem with Sequoia, you can directly send the report on the mailing list to get a quick diagnostic of what happened.

  • JmxSettings: JMX is the technology used for management and monitoring in Sequoia. These functionalities can be accessed through HTTP with an Internet browser or through the RMI connector used by the Sequoia console.

  • VirtualDatabase: Defines a virtual database to load automatically at controller startup given a reference to its configuration file.

  • SecuritySettings: Allows to filter accesses to a controller based on access lists.

The attributes of a Controller element are defined as follows:

  • port: the port number on which clients (Sequoia drivers) will connect. The default port number is 25322.

    [Note]Note

    A port number below 1024 will require running the controller with privileged rights (root user under Unix).

  • ipAddress: This can be defined to bind a specific IP address in case of a host with multiple IP addresses. This can be ignored if there is only one IP address available and will be replaced by 127.0.0.1.

  • backlogSize: the server socket backlog size (number of connections that can wait in the accept queue before the system returns "connection refused" to the client). Default is 10. Tune this value according to your operating system, but the default value should be fine for most settings.

If your machine has multiple network adapters, you can for the Sequoia Controller to bind a specific IP address like this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE SEQUOIA-CONTROLLER PUBLIC "-//Continuent//DTD SEQUOIA-CONTROLLER 2.10//EN"  "http://sequoia.continuent.org/dtds/sequoia-controller-2.10.dtd">
<SEQUOIA-CONTROLLER>
	<Controller port="25322" ipAddress="192.168.0.1">
		<JmxSettings enabled="false"/>
	</Controller>
</SEQUOIA-CONTROLLER>
    

7.3.2. Internationalization

You can use this element to override the default locale retrieved by java. English is the only language looked at at the moment.

    <!ELEMENT Internationalization EMPTY>
    <!ATTLIST Internationalization language (en|fr|it|jp) "en">
    

7.3.3. Report

A report can be define in case you want to get a trace of what happened during the execution of the controller. If this element is included in the controller.xml report is enabled and will output a report, under certain conditions, in a file named sequoia.report.

<!ELEMENT Report EMPTY>
<!ATTLIST Report
     hideSensitiveData  (true|false) "true"
     generateOnShutdown (true|false) "true"
     generateOnFatal    (true|false) "true"
     enableFileLogging  (true|false) "true"
     reportLocation     CDATA        #IMPLIED
>
    
  • hideSensitiveData: will replace passwords with '*****'.

  • generateOnShutdown: tells the controller to generate a report when it has received a shutdown command.

  • generateOnFatal: tells the controller to generate a report when it cannot recover from an error.

  • enableFileLogging: logs all the console output into a file and include this file into the report.

  • reportLocation: specify the path where to create the report, default is SEQUOIA_HOME/log directory.

7.3.4. JMX

JMX is used to remotely administrate the controller. You can use the bundled Sequoia console or use your own code to access JMX MBeans via the protocol adaptor. Sequoia proposes both the RMI and HTTP adaptors of the MX4J JMX server. You can override the default port numbers for each adaptor if they conflict with another application that is already using them (i.e. another Sequoia controller on the same machine).

    <!ELEMENT JmxSettings (HttpJmxAdaptor?, RmiJmxAdaptor?)>
    <!ELEMENT HttpJmxAdaptor EMPTY>
    <!ATTLIST HttpJmxAdaptor
      port CDATA "8090"
    >

     <!ELEMENT RmiJmxAdaptor (SSL?)>
     <!ATTLIST RmiJmxAdaptor
       port         CDATA        "1090"
       username     CDATA        #IMPLIED
       password     CDATA        #IMPLIED
     >


     <!ELEMENT SSL EMPTY>
     <!ATTLIST SSL
       keyStore			CDATA        #REQUIRED
       keyStorePassword		CDATA        #REQUIRED
       keyStoreKeyPassword	CDATA        #IMPLIED
       isClientAuthNeeded	(true|false) "false"
       trustStore		CDATA        #IMPLIED
       trustStorePassword	CDATA        #IMPLIED
     >
     

Configure ssl for encryption and/or authentication.

  • keyStore: The file where the keys are stored

  • keyStorePassword: the password to the keyStore

  • keyStoreKeyPassword: the password to the key, if none is specified the same password as for the store is used

  • isClientAuthNeeded: if set to false ssl is used for encryption, the server is only accepting trusted clients (the client certificate has to be in the trusted store)

  • trustStore: the file where the trusted certificates are stored, if none is specified the same store as for the key is used

  • trustStorePassword: the password to the trustStore, if none is specified the same password as for the keyStore is used

You have to enable the RMI adaptor if you want to use the Sequoia console to administrate the controller remotely. To enable the RMI JMX adaptor, use this setting:

     <JmxSettings>
     	 <RmiJmxAdaptor/>
		 </JmxSettings>
     

7.3.5. Virtual Database

This element specifies virtual databases to load at controller startup.

<!ELEMENT VirtualDatabase EMPTY>
<!ATTLIST VirtualDatabase 
    configFile          CDATA #REQUIRED
    virtualDatabaseName CDATA #REQUIRED
    autoEnableBackends  (true | false | force) "true"
    checkpointName      CDATA ""
>
      
  • configFile: The path to the virtual database configuration file. See Section 10, “Virtual database configuration” to learn how to write a virtual database configuration file.

  • virtualDatabaseName: The name of the virtual database since the configuration file can contain multiple virtual database definitions.

  • autoEnableBackends: set to true by default to reenable backends from their last known state as stored during last shutdown. If backends where not properly shutdown, nothing will happen. You can specify false to let the backends in disabled state at startup. The force option should only be used if you know exactly what you are doing and override backend status by providing a new checkpoint. Warning! Use this setting carefully as it might break your database consistency if you do not provide a valid checkpoint.Force is considered the same as true if no recovery log has been defined.

  • checkpointName: the checkpoint name to use with the recovery log to enable backend from a known coherent state. If the checkpoint is omitted, the last known checkpoint is used.

Example:

<VirtualDatabase configFile="/databases/MySQLDb.xml" virtualDatabaseName="rubis" autoEnableBackends="true"/>
      

This will enable a virtual database named rubis taken from a configuration file named /databases/MySQLDb.xml and will enable all backends of the database from the last known checkpoint.

7.3.6. Security

Security settings define the policy to adopt for some functionalities that may compromise the security of the controller. These settings depends on your environment and can be relaxed if you are running in a secure network. The less security settings you have, the faster the controller will run. A SecuritySettings element is defined as follows:

      <!ELEMENT SecuritySettings (Jar?, Accept?, Block?, SSL?)>
      <!ATTLIST SecuritySettings
        defaultConnect (true|false) "true"
      >
      

defaultConnect: is used to allow (true) or refuse (false) connections to the controller. This default setting can be then be tuned with access lists defined in Accept and Block elements (see below).

Additional database drivers can be uploaded dynamically to the controller. As the controller has no way to check if this is a real JDBC driver or some malicious code hidden a JDBC driver interface, you have to be very careful if you enable this option and anybody can connect from anywhere to your controller.

<!ELEMENT Jar EMPTY>
<!ATTLIST Jar
	allowAdditionalDriver (true|false) "true"
>
      

You can control who can connect to the controller by setting access lists based on IP addresses to accept or block. defaultConnect is set in SecuritySettings defined above. Default is to accept all connections if no security manager is enabled.

<!ELEMENT Accept (Hostname|IpAddress|IpRange)*>
<!ELEMENT Block (Hostname|IpAddress|IpRange)*>

<!ELEMENT Hostname EMPTY>
<!ATTLIST Hostname 
     value CDATA #REQUIRED
>
      

IpAddress value is an IPv4 address (ex:192.168.1.12):

<!ELEMENT IpAddress EMPTY>
<!ATTLIST IpAddress  
     value CDATA #REQUIRED
>
      

IpRange value is based on IPv4 addresses and has the following form: 192.168.1.*.

<!ELEMENT IpRange EMPTY>
<!ATTLIST IpRange  
     value CDATA #REQUIRED
>
      

Here is a full security configuration example:

<SecuritySettings defaultConnect="false">
  <Jar allowAdditionalDriver="true"/>
  <Shutdown>
    <Client allow="true" onlyLocalhost="true"/>
    <Console allow="false"/>
  </Shutdown>
  <Accept>
    <IpRange value="192.168.*.*"/>
  </Accept>
</SecuritySettings>
      

This setting accepts driver connections only from machines having an IP address starting with 192.168, allows loading of additional drivers via the console, refuses shutdown from the console, but allows it from the local machine.

7.4. Configuring the Log

Sequoia uses the Log4j logging framework. The log4j.properties configuration file is located in the /sequoia/config directory of your installation. Here is a brief description of the loggers available in the configuration file:

  • log4j.logger.org.continuent.sequoia.core.controller : Controller related activities mainly for bootstrap and virtual database adding/removal operations.

  • log4j.logger.org.continuent.sequoia.controller.xml.Handler : XML configuration file parsing and handling.

  • log4j.logger.org.continuent.sequoia.controller.VirtualDatabase : Virtual database related operations. A specific log4j.logger.org.continuent.sequoia.controller.VirtualDatabase.virtualDatabaseName logger is automatically created for each virtual database. This allows to tune different logging levels for each virtual database.

  • log4j.logger.org.continuent.sequoia.controller.VirtualDatabase.request : Log the incoming requests and transactions in files that can be replayed by the Request Player tool provided with Sequoia.

  • log4j.logger.org.continuent.sequoia.controller.distributedvirtualdatabase.request : Log distributed request execution when using horizontal scalability (a.k.a. controller replication).

  • log4j.logger.org.continuent.sequoia.controller.backup : Log backup manager and backuper related activities from dump/restore operations.

  • log4j.logger.org.continuent.sequoia.controller.VirtualDatabaseServerThread : The server thread accepts client connections and manages the worker threads.

  • log4j.logger.org.continuent.sequoia.controller.VirtualDatabaseWorkerThread : Each worker thread handle a session with a client Sequoia driver.

  • log4j.logger.org.continuent.sequoia.controller.RequestManager : Log the request flows between the different Request Manager components (scheduler, cache, load balancer, recovery log).

  • log4j.logger.org.continuent.sequoia.controller.scheduler : Log the request ordering and synchronization performed by the scheduler.

  • log4j.logger.org.continuent.sequoia.controller.cache : SQL Query cache related activities.

  • log4j.logger.org.continuent.sequoia.controller.loadbalancer : Log how requests are balanced on the backends.

  • log4j.logger.org.continuent.sequoia.controller.connection : Connection pooling related information.

  • log4j.logger.org.continuent.sequoia.controller.recoverylog : Sequoia Recovery Log information.

  • log4j.logger.org.continuent.sequoia.controller.console.jmx : JMX management system logging.

  • log4j.logger.org.continuent.hedera.channels: Hedera low level group communication channel.

  • log4j.logger.org.continuent.hedera.gms: Hedera Group Membership Service (GMS).

  • log4j.logger.org.continuent.tribe.discovery: Tribe Discovery Service (used by GMS).

  • log4j.logger.org.continuent.hedera.adapters: Hedera Multicast Dispatcher building block for application level message handling.

  • log4j.logger.org.jgroups: JGroups core messages when Hedera is used with JGroups.

  • log4j.logger.org.jgroups.protocols: JGroups protocol stack messages when Hedera is used with JGroups.

7.5. Recovery Log

When you want to add a database to your cluster, you do not want to stop the system, replicate the current database state to the new database (that may take a long while) and then restart the system. The Recovery Log helps you in the process of dynamically adding a new backend (or recovering a previously failed backend) without stopping the system.

The Recovery Log records the write operations and transactions that are performed by the Sequoia controller between checkpoints. A checkpoint is just a logical index in the log that reflect the recovery log state at a given time. As of Sequoia 2.0, checkpoints are automatically managed by the controller and are generated when needed on behalf of the administrator when a backend is disabled or enter a backup phase. When re-enabling the backend, the Recovery Log replays all write queries and transactions that the backend missed during the time it was offline and it comes back to the enabled state once it is synchronized with the other nodes.

Since version 2.0, the backup infrastructure has completely changed and is based on Backupers. We provide a generic Backuper based on Enhydra Octopus to copy, backup and restore content of backends through JDBC. Even if Octopus is supposed to handle most common databases, it might fail for some specific databases or data types. In that case, we strongly recommend to use or implement a database specific Backuper.

[Note]Note

Octopus currently fails to backup/restore empty databases. You need at least to have one table in your database if you don't want the backup operation to fail with Octopus.

7.5.1. A practical example

Your Web site is running with a single database and you want to use Sequoia with three nodes using full replication (RAIDb-1). You have two new backends ready to be installed. You can start the Sequoia console and connect to the controller. Start the administration module by connecting to the virtual database. Type: backup <backend name> <dump name> <backuper name> <path to backup directory>. If you want to use Octopus you will use a command line like backup node1 dump1 Octopus /var/backups. During the backup, the update requests are logged in the recovery log, so no update is lost. If the backend was in the enabled state when backup was initiated, it will automatically replay the recovery log to resynchronize itself and return to the enabled state.

To restore the dump on another backend, just type restore <newbackend> <dumpname> and the appropriate backuper (Octopus in our previous example) will be used to restore the dump. After restoring the dump, you can enable the backend at any time so that the recovery log replays all the missing requests since the dump was taken.

Here is the set of commands to use in the Sequoia console if node1 is your existing backend and you want to dynamically add node2 and node3:

backup node1 initial_dump Octopus /var/backups
restore node2 initial_dump
restore node3 initial_dump
enable node2
enable node3
        
[Note]Note

Note that these steps can be automated by scripting the console.

If a node crashes, use the administration console to restore the dump on the node using the restore command. Once the dump is restored, re-enable the backend from the stored checkpoint and the Recovery Log will automatically replay all the write queries to rebuild a consistent database state on the node.

To prevent the recovery log from being too large, you can periodically perform backup operations. This will also lower the recovery time since the part of the log to replay will be smaller. You can delete older dumps and logs if you do not need them anymore.

7.5.2. Understanding checkpoints

A checkpoint is a reference used by the recovery log to replay missing requests. If a backend is disabled from the console for maintenance, the controller will automatically create a checkpoint (in C-JDBC, the checkpoint name had to be provided manually through the console). Once the backend is enabled again, the controller retrieves its last known checkpoint from the recovery log and replays all the requests that the disabled backend missed since it was disabled. A checkpoint is nothing more than a reference in time.

7.5.3. A fault tolerant Recovery Log

As the Sequoia recovery log can be stored in a database providing a JDBC driver, it is possible to make the recovery log fault tolerant by redirecting it to a Sequoia controller (even self) that will distribute and replicate the log content on several backends.

The JDBC Recovery Log configuration is detailed in Section 10.6.5, “Recovery Log”.

7.6. Controller replication

To prevent the Sequoia controller from being a single point of failure, Sequoia provides controller replication also called horizontal scalability. A virtual database can be replicated in several controllers that can be added dynamically at runtime. Controllers use the JGroups group communication middleware to synchronize updates in a distributed way. The JGroups stack configuration is found in config/jgroups.xml and should not be altered unless you specifically know what you are doing. Keep in mind that total order reliable multicast is needed to ensure proper synchronization of the controllers. More information about JGroups can be found on the JGroups web site. Note that JGroups requires proper network settings, here are a few guidelines:

  • a default route must be defined (check with /sbin/route under Linux) for the network adapter which is bound by JGroups (usually eth0). If such route does not exist, either the group communication initialization will block or controllers will not be able to see each other even on the local host. If you don't have any default entry in your routing table you can use a command like '/sbin/route add default eth0' to define this default route.

  • issues have been reported with DHCP that can either block (under Windows) or just fail to properly set a default route and leads to the issue reported above. We strongly discourage the use of DHCP, you should use fixed IP addresses instead.

  • name resolution should be properly set so that the IP address/machine name matching works both ways. Often improper /etc/hosts or DNS configuration leads to group communication initialization problems. In particular, under Linux, the IP address associated to the name returned by the 'hostname' command must not resolve to 127.0.0.1 else controllers will not see each other.

Horizontal scalability can also be provided using Appia. The Appia stack configurations are found in config/appia.xml. This file contains six different configurations, six templates for communication channels and their respective channel instantiations. These are the combinations of two total order implementations (sequencer based and token based total order) using different transport protocols: TCP, UDP and UDP multicast. Instructions to change the default configuration are in the header of the file. All the defined configurations ensure total order reliable multicast. More information about Appia can be found on the web site. Note that Appia also requires proper network settings, here are a some guidelines:

  • a default route must be defined (check with /sbin/route under Linux) for the network adapter which is bound by Appia (usually eth0). If such route does not exist, controllers will not be able to see each other. If you don't have any default entry in your routing table you can use a command like '/sbin/route add default eth0' to define this default route.

  • name resolution should be properly set so that the IP address/machine name matching works both ways. Often improper /etc/hosts or DNS configuration leads to group communication initialization problems. In particular, under Linux, the IP address associated to the name returned by the 'hostname' command must not resolve to 127.0.0.1 else controllers will not see each other.

  • Appia does not need to use fixed IP addresses, unless you want to bind a controller to a specific IP address. To discover other controllers Appia uses a gossip service. The gossip service can be configured to use a multicast address (if your network supports it) or you can start a gossip server. This server can also be replicated and is used just to help the dynamic discovery of new nodes.

In order for a virtual database to be replicated, you must define a Distribution element in the virtual database configuration file (see Section 10.2.1, “Distribution”). There are several constraints for different controllers to replicate a virtual database:

  • give the list of all controllers that you plan to use for replication of your virtual database in the Sequoia driver URL. Even if all controllers are not online at all times, the driver will automatically detect the alive controllers: jdbc:sequoia://node1,node2,node3,node4/myDB

  • the virtual database must have the same name and use the same groupName (in the Distribution element).

  • each controller must have its own set of backends and no backends should be shared between controllers (Sequoia checks the database URLs, having different backend names is not sufficient).

  • each controller must have its own recovery log, recovery logs cannot be shared. It is possible for a controller not to have a recovery log but this controller will have no recovery capabilities.

  • the authentication managers must support the same logins.

  • schedulers and load balancers must implement the same RAIDb configuration.

  • database schemas (if defined) must be compatible according to the RAIDb level you are using.

[Note]Note

As backends cannot be shared between controllers, it is not possible to use a SingleDB load balancer with controller replication. If each controller only has a single database backend attached to it, then you must use a RAIDb-1 configuration since in fact you have 2 replicated backends in the cluster.

Several configuration file examples are available in the doc/examples/HorizontalScalability directory of your Sequoia distribution.

[Note]Note

You can find more information in the document titled "Sequoia Horizontal Scalability - A controller replication user guide" available from the Sequoia web site.

7.7. Current Limitations

The Sequoia controller in its 2.10 release has the following limitations:

  • GRANT/REVOKE commands will be sent to the database engines but this will not add or remove users from the virtual database authentication manager.

  • network partition/reconciliation is not supported,

  • distributed joins are not supported which means that you must ensure that every query can be executed by at least a single backend,

  • RAIDb-1ec and RAIDb-2ec levels are not supported,