BOOKMARKS database and internet robot Here is a set of classes, libraries and programs I use to manipulate my bookmarks.html. I like Netscape Navigator, but I need more features, so I am writing these programs for my needs. I need to extend Navigator's "What's new" feature (Navigator 4 named it "Update bookmarks"). These programs are intended to run as follows. 1. bkmk2db converts bookmarks.html to bookmarks.db. 2. check_urls (Internet robot) runs against bookmarks.db, checks every URL and saves results in check.db. 3. db2bkmk converts bookmarks.db back to bookmarks.html. Then I use this bookmarks file and... 4. bkmk2db converts bookmarks.html to bookmarks.db. 5. check_urls (Internet robot) runs against bookmarks.db, checks every URL and saves results in check.db (old file copied to check.old). 6. (An yet unnamed program) will compare check.old with check.db and generate detailed report. For example: this URL is unchanged this URL is changed this URL is unavailable due to: host not found... AUTHOR Oleg Broytman COPYRIGHT and LEGAL ISSUES All programs copyrighted by Oleg Broytman and PhiloSoft Design. Copyright (C) 1997-2000 PhiloSoft Design All sources protected by GNU GPL. Programs are provided "as-is", without any kind of warranty. All usual blah-blah-blah. #include LICENSE GPL STATUS Parser is Ok. Storage managers: pickle, FLAD (Flat ASCII Database). Writers: HTML, text, FLAD. Robots: simple, forking. ------------------------------ bkmk2db ------------------------------ NAME bkmk2db.py - script to convert bookmarks.html to a database. SYNOPSIS bkmk2db.py [-is] [/path/to/bookmarks.html] DESCRIPTION bkmk2db.py splits given file (or ./bookmarks.html) into a database (using storage plugin). Options: -i Inhibit progress bar. Default is to display progress bar if stderr.isatty() -s Suppress output of statistics at the end of the program. Default is to write how many lines the program read and how many URLs parsed. Also suppress some messages during run. BUGS Aliases are not supported (yet). ------------------------------ db2bkmk ------------------------------ NAME db2bkmk.py - script to reconstruct bookmarks.html back from a database. SYNOPSIS db2bkmk.py [-is] [-o output_file] [-t dict.db [-r]] DESCRIPTION db2bkmk.py reads bookmarks.db and creates two HTML files - Options: -i Inhibit progress bar. Default is to display progress bar if stderr.isatty() -s Suppress output of statistics at the end of the program. Default is to write how many records the program proceed and how many URLs created. Also suppress some messages during run. -o output_file Put output into different file. -t dict.db For most tasks, if someone need to process bookmarks.db in a regular way (for example, replace all "gopher://gopher." with "http://www."), it is easy to write special program, processing every DB record. But there are cases when someone need to process bookmarks.db in a non-regular way: one URL must be changed in one way, another URL - in second way, etc. The -t option allows to use external dictionary for such translation. The dictionary itself is FLAD database, where every record have two keys - URL1 and URL2. With -t option in effect, db2bkmk generates translated version of bookmarks.html, where every URL1 is replaced with corresponding URL2 from the translation dictionary. (See koi2win.db for example of translation dictionary) -r Reverse the effect of -t option - translate from URL2 to URL1. ------------------------------ check_urls ----------------------------- NAME check_urls.py - Internet robot SYNOPSIS check_urls.py [-is] DESCRIPTION check_urls.py runs a robot plugin against every URL. Additional field Error appeared in records that have not been checked by some reasons; the reason is a content of Error field. Options: -i Inhibit progress bar. Default is to display progress bar if stderr.isatty() -s Suppress output of statistics at the end of the program. Default is to write how many records the program proceed and how many URLs checked. Also suppress some messages during run. BUGS Ugly mechanism to catch welcome message from FTP server (from urllib).