Bookmarks monitoring module documentation

Adrien Di Mascio

Etienne HERMAN

Sylvain THENAULT


Table des matières

Install
Parameter
Recipe which update the bookmarks file: Update_bmf
Parse_bmf
update_bmf
Recipe which allow to check out which pages have been modified :check_site
get_nodes
Compute_checksum
replace_nodes
get_xslt_result
write_xml_file
Sorting bookmarks
Future improvements

Résumé

The aim of this module is to enable bookmarks monitoring. This module is made of several recipes which :

  • Create and update the bookmarks file stored in $NARVAL_HOME/data.

  • Go and check which Web pages, among the bookmarks, have changed since the last visit..

  • Sort bookmark by themes

This module is a part of Infopal.

Install

Use Narval package manager tool npm.py ( in Narval/narval/ ) to intall Infopal package downloadable from ftp site .

Parameter

You must specify bookmarks file to manage, by add in $NARVAL_HOME/data/memory.xml file.

  • If your file is netscape html bookmarks, then add like this example :

    <bookmark-file type="netscape">$HOME/.netscape/bookmarks.html</bookmark-file>
          
    where $HOME is your home directory.

  • Else, if your file is url list, then add like this example :

    <bookmark-file type="raw">$HOME/revpresse.txt</url>
          

Recipe which update the bookmarks file: Update_bmf

The recipe is composed of two steps:

Parse_bmf

This first step allow to convert this file in an XBEL file. Once this step is done, the generated XBEL element is in Narval's memory but it is not stored in a file.

update_bmf

This step will try to find an XBEL file in the $NARVAL_HOME/data directory.

  • If a file is found, then the recipe update it (remove all the bookmarks which are not used anymore, add the new ones, etc.). We can't just copy out the file created at the previous step because the XBEL file in $NARVAL_HOME/data may contain specific informations on visited pages (checksum and changed attributes) which are notably used to check if these pages have been modified or not. (cf. check_site recipe).

  • If the file doesn't exist, then this step will just create it and write out the XBEL element obtained at the previous step.

Recipe which allow to check out which pages have been modified :check_site

The result of the previous step is that the XBEL file $NARVAL_HOME/data/bookmarks.xbel is now updated This recipe will now test which of the bookmarks pages have been modified since last check. The location of the HTML file computed by this recipe can be set set as wished by editing the recipe with Horn and changing the "html-file" element value in get_xslt_result arguments (default is file://$NARVAL_HOME/data/modified_pages.html)

Here is a global description of how this recipe works: we work on a bookmarks XBEL file. Then, we extract the different bookmarks and calculate, for each one, a checksum for the corresponding Web page that we compare with the former checksum if we've got one. We then know which pages have been modified. The last part of the recipe will then just build an HTML page by making a list of all modified pages.

get_nodes

This step first retrieve the bookmark elements in an XBEL file. We can modify the XBEL source (the XBEL file location) by editing the recipe and by changing the xml-repository value in arguments. The retrieved elements are then stored in Narval's memory and will then be processed in the next step.

Compute_checksum

This step will work on each bookmark elements stored in Narval's memory at the previous step. For each bookmark, the algorithm is:

  • Retrieve the url referenced by the bookmarks.

  • Calculate a checksum for this page

  • If the bookmark element already had a checksum attribute, then we take it and compare it to the one we've just calulated. If they're equals, the page haven't been modified, we then put the changed attribute to no. Else, the page have been modified, so we put the changed attribute to yes and we update the checksum attribute value.

  • If no checksum attribute have been found, then we initialize the checksum attribute by affecting the value we've just calculated.

This step will create in Narval's memory a bookmark element for each bookmark found in the XBEL file.

replace_nodes

This step will update the XBEL file by replacing the bookmark elements of the XBEL file with the those stored in Narval's memory which have been generated at the previous step.

get_xslt_result

This step is a transformation. It browses the XBEL file which have been updated at the previous step and then creates a list of all the modified pages by looking at the changed attribute value in the bookmark elements. The result is a xhtml element and it's stored in Narval's memory.

write_xml_file

This step will retrieve the xhtml element generated at the previous step and then write it out into the file $NARVAL_HOME/data/modified_pages.html. Here again, this file location can be changed by editing the recipe.

Sorting bookmarks

The classify recipes provide an adaptable sort method. It use the mobil points method, and if there are enough bookmarks to sort, the TF-IDF method (Term Frequency - Inverse Document Frequency) is used too. See the Classify.classify action documentation if you want more control on the sort.

You will obtain after a few minutes, the $NARVAL_HOME/data/bookmarks_sorted.html file.

Future improvements

  • For the moment, only Netscape/Mozilla bookmarks, raw file with url list and Xbel compliant files can be processed but we'll soon add other browsers such as Explorer, Lynx (a beta version is available), and Opera. (Maybe others will come).

  • Allow user to obtain more precise informations on pages which have been modified. For instance, enable to precise which parts of the page were modified.