Class | Repository::Repository |
In: |
lib/fastcst/repo.rb
|
Parent: | Object |
Represents a single directory based repository stored on disk and supports all the operations you can do to a repository. The Repository’s job is to manage the information and changesets that someone deals with during their daily work with FastCST. It doesn’t do any changeset creation or application. It doesn’t handle distribution or replication either. It simply provides an API to control the raw repository.
The repository layout is really very simple:
The env.yaml contains all the configuration information that fcst and the Repository object needs to operate. This files maintains information like the developer’s name, e-mail, common settings for commands, command history for the shell, etc. It is updated infrequently in response to user activities and is only kept "live" when it needs to be modified.
I originally designed the repository to have an "index file" that was used to keep track of the repository structure in a .yaml file so it could be loaded quickly. After testing I found that this was entirely unecessary since very large revision trees could be loaded by direct analysis very quickly (less than a second to load a 1000 node repository). As long as the revision tree is cached and updated when store/delete operations are called then everything works really quick. This simplifies the design quite a bit and doesn’t require any weird maintenance of an external file.
The downside to this design is that it makes it nearly impossible for a person to go in and see what’s in the repository. I’ll hopefully have a set of commands or a tool to help with this once I figure out what people need from it.
The root directory contains all of the changesets that the user may have downloaded from others. The root directory is very simple in that it just contains a flat set of directories named after the lowercase string of each changeset’s UUID. Inside each of these directories is the changesets .md meta-data file and any of it’s contents.
Since the revision tree for any repository is determined by the meta-data files and how they reference other changeset UUIDs, it’s not necessary to store them in a tree. It’s easier to store them in a way that is convenient for scanning and finding the changesets.
Another advantage of this configuration is that it makes synchronizing from multiple external repositories a piece of cake since none of the directories will clash. It also makes it possible to upload new changesets to a repository without interfering with people currently downloading.
Humans don’t really work with big numbers very well, so there must be a way of presenting a changeset that is easy for them to digest. An initial first whack at it might be:
[revision_name]-[uuid_chunk](-email)
The revision name is just whatever the creator set the revision name as in the meta-data. Since this can easily conflict with other revision names, a small piece of the uuid is added (say 3 from the beginning). This seems to create unique enough names that don’t overload the reader. In the occaisional rare chance that the revision and uuid_chunk are the same then we just tack on the e-mail address of the creator to make the name fully unique. It should be really rare that two revisions have the same name, uuid_chunk, at the same place in the revision tree.
I think a good way to understand the repository layout is to describe how someone would build a repository from scratch with just a set of changesets. This would be necessary in situations where someone damaged their changeset or would like to start from scratch. One of the design goals is that you can create a fully working repository with just a set of changesets.
The process would most likely be something like this:
There’s a lot of hand-waving in that description, but in theory it should work.
DEFAULT_FASTCST_DIR | = | ".fastcst" |
env_yaml | [R] | |
originals_dir | [R] | |
path | [R] | |
pending_mbox | [R] | |
plugin_dir | [R] | |
root_dir | [R] | |
work_dir | [R] |
Creates a new repository at the given path so you can use Repository.instance to get the repository information.
The path given is created if it does not exist. The newly created repository is completely baren and useless. To get it into a reasonable state you’d then need to add some changesets and fill in the originals dir.
Opens the repository that is at the given path which should be the full path to the top of the repository (where the env.yaml file is located).
Searches for repo_dir (like .fastcst) by going up the tree and returns the first full path that it finds. The given directory must be properly formed and contain a env.yaml file plus all directories.
Generates a human readable name from the given uuid, trying to add information as necessary to make it unique among its siblings.
The algorithm for this is simply:
1. Create: [revision_name] 2. Get this uuid's siblings (parent's immediate children) 3. If the list of children contains a revision with the same [revision_name] then add the [uuid_chunk]. If it's still the same then add the e-mail address.
This makes sure that the name is unique enough for a person to read, but still avoids clashes between the children. It doesn’t handle different revisions with the same name in other parts of the tree, but the rationale is that this won’t be necessary since most operations are done relative to the current revision path.
Removes a changeset from the repository based on the uuid. Returns the same information as find_changeset so the caller can analyze the results.
Used to get default values which can be overridden by command line settings, and display a message if there’s a failure.
A convenience method that returns the UUIDs of all children to this changeset UUID. It’s currently horribly inneficient since it rebuilds the revision tree by scanning the changeset directory each time it’s called. This is mostly done right now since it is simple and works without needing to update anything fancy.
This actually turns out to be reasonably fast. Initial tests said it can process 1000 randomly structured changesets in less than a second.
Given a uuid it will return an array of [full_path, meta_data] so that you can load the meta-data and any contained files. The full_path is the directory where the meta_data structure’s contents reside.
Builds a list of all changesets which match the given name. It returns an Array of [revision_name, uuid] for each one found. You can set type == to :revision or :id and it will dynamically lookup based on that.
Returns a list of all the changesets in the root directory by loading the root directory contents and grepping for /^[a-zA-Z0-9]/ which works since all UUIDs match this format.
Dynamically resolves and id when given either a revision name or an id (or both). It will adapt depending on whether rev, id, or both is given. The logic is that it tries to find a unique id for a revision named after rev, if rev isn’t given then it checks that the given id is valid. Finally, if neither is given then it returns the top of the current path. If it returns nil then it couldn’t find anything useful.
Returns the "revision tree hash" which is a simple two-level representation of all the UUIDs that have children and their children as an array. It caches the results of building the revision tree in @cached_rev_tree and will just return that unless force==true. Other functions will set the @cached_rev_tree = nil in order to have this function rebuild the tree the next time it is requested.