PlanetMath Task Tree

Table of Contents

1 Content

1.1 FEM

Work in progress at https://github.com/planetmath

2 Catalog

2.1 Centralized Bibliographic Database

2.1.1 TODO Locate LOC catalog and extract math part.

Note that, for the last 50 years, just yanking out QA is not good enough since it will also include every computer manual.

2.1.2 TODO Put resulting math catalogue online.

2.1.3 TODO Figure out how to work math specific metadata (such as what we had in our old tags) into BibJSON or whatever format.

2.1.4 TODO Extract math specific metadata from publisher catalog and other sources.

2.2 PM-Xi

Work in progress here, in home/shared-folder/index-project and is available via Git once access is established.

git clone git@li311-58.members.linode.com:/opt/git/index.git

2.2.1 Phase I: Examples and demos

We start small with a handful of indices. At this scale, we don't need much in the way of automated tools and can do things by hand. The aim is to make some simple proofs of concept and get a feel for how the project goes.

  • TODO Collect indices for Complex Analyis and Category theory
    At first, start with about two dozen indices for each subject. The books have already been identified. This will provide a useful demo of what a whole subject might look like.
  • TODO Refactor and finish zml to html converter
    Should be able to parse it using regular expressions like a wiki engine. Try out xmlgen as a code generator.
  • TODO Public domain books
    In addition to old books on Complex analysis pull up some more indices from public domain math books. An important feature of these is that, in addition to the just the indices and the tables of contents, the full text of the book is available for use and public hosting.

2.2.2 Phase II: Combine indices

Just putting a bunch of book indices in the same file format in the same folder — whether it's 50 of them or 500 or 5000 — does not constitute a cross-index. To rise to that level, we need to turn this discrete set into an interacting whole and make explicit the relations between terms across the literature.

  • TODO Generate topic indices
    Write facilities which will take a collection of book indices, pick out books by metadata criteria such as topic, and combine their indices into a single comprehensive index for that subcollection.
  • TODO Study relations between words
    Find occurrences of index terms in the body of a work. See which terms are more likely to occur in the same sentence or paragrpaph, which terms are likely to preceed which terms, etc.

2.2.3 Phase III: Prepare tools

  • TODO Study 50 examples
    In the process of figuring out how to download all the Springer back matter, we started out by looking at 100 examples. In the process, we got 50 examples of back matter PDF's. Study these to find out what is present and figure out what sorts of processing will be needed.
  • TODO Filter out indices
    An index has some feeatures which distinguish it from most other pages in a book such as short lines of text in alphabetical order with numbers in between them. If we could automatically pull out just the index pages from the back matter, that would save a considerable amount of effort.
  • TODO Standardize formats
    For the purpose of getting started, we picked some format for our indices. Once we have gained some experience using it but before proceding to enter hundreds of indices, we should re-evaluate the format and standardize it properly.
  • TODO Automated formatter
    Automate the process of identifying headings, subheadings, etc. in a index as a text file from a book as far as is practical.

2.2.4 Phase IV: Large scale operations

  • TODO Copy Springer data from Einstein server
    This might have to wait until Cameron is available.
  • TODO Other publishers
    Look at other publishers and harvest indices where feasible.
  • TODO Public domain
    Locate all the math books on Internet Archive, download copies, and start extracting indexes.
  • TODO Set up organization for index project
    Once we get to the full scale of thousands of books, we will need some sort of organizer system to keep track of which book indices are ready, which are in various states of preparation, and which ones have special issues so as not to get lost.

3 Organization

4 Software

4.1 Hypernotebooks

Work in progress on Ray's computer.

Date: 2016-04-20 Wed

Author: PlanetMath development team

Org version 7.9.3f with Emacs version 24

Validate XHTML 1.0