I have recompiled the July Wikipedia for Pocket PC and Palm, this time with aliasses.
Posts by ErikZachte
Willkommen!
Wenn du im Nexave-Forum mitmachen möchtest, schreib an community@nexave.de. Wir haben die Registrierungsfunktion in unserem Diskussionsforum nämlich deaktiviert, weil sich praktisch nur noch Spammer und Werbebots registriert haben. Per E-Mail sind wir dir gern behilflich, einen Account anzulegen.
-
-
Gibt es die TR-Version mit Images auch irgendwo als Download -mußauch nicht gratis sein.
Nur 30€ für die DVD ist recht heftig!.Michael: where do you find a highly skilled programmer willing to work for €16 net income per hour. That 's what is left after costs and 41% income tax. So at best you pay for one hour programming. In 5 years I spent at least 1500 hours on this project: also 'recht heftig'. Please don't complain. Erik
-
Image version yes. Text-only version in a few days (I am preparing new editions from the same July dumps).
-
TomeRaider for Pocket PC supports tables for 5 years. The Palm version will never support tables unless Yadabyte reinvents itself. Based on comments from others, Mobipocket is far superior to TomeRaider Palm and TomeRaider Pocket PC is far superior to Mobipocket. Anyone can download the preview and judge for him/herself.
-
The German Wikipedia with images for Windows/Pocket PC is now available.
Order now!Based on July 2007 data dump, 663,000 articles, 308,000 images, 3.1 Gb.
And of course with aliassesI will now focus on other language editions first, then possibly look into Palm optimizations later.
-
I have downloaded all images for the German, French and Spanish Wikipedias (a process that took weeks). Now the process of compiling final releases for Pocket PC begins, starting with the German release.
Hopefully the 400,000+ non-math images and 125,000+ math formulas (also images) together with 2 Gb text will fit in a 4Gb file. Otherwise add a small week for each rerun with other parameters (and 400,000 newly resized images).
Oh, and aliasses will be included
For a preview ready for download see the progress report
-
I can fully understand the decision of Alexander. Recent digging in the capabilities of TomeRaider for Palm, helped by Itsme, made me very disappointed. Until then I was partially blinded because the same files look good in TomeRaider for Windows.
Recent findings convinced me that TomeRaider for Palm capabilities are really insufficient, and actually almost give my project a bad name.
I built my scripts for the TomeRaider version that I know and use, namely Pocket PC. And added support for Palm merely as a service to the community.TomeRaider for Palm has hardly progressed in recent years and has very basic functionality. Page rendering is utterly basic. No category search, no image pan and zoom, etc. In fact many Palm users don't know what they miss until they see the Pocket PC version.
TomeRaider for Pocket PC is a complete other experience, page rendering closely follows the online Wikipedia in all aspects (exception: internal links to page sections). Nearly all html and css is supported, unicode font support, categories, image browser. This huge difference is caused by the lucky choice of delegating all page rendering to the built-in browser engine.
I will study the Mobipocket project myself. Last time I looked it was still markedly less capable in page rendering and navigation than TR3 for Pocket PC.
-
Itsme, did you get the mail I sent you soon after your post?
If not please mail me at erikzachte@+++.com (nospam: read +++ as infodisiac). Cheers, Erik -
After I saw the screen shots Itsme provided (thanks) I decided to do some digging. Luckily I could buy an old CLie (T625C) from a colleague, to do tests. Unluckily the device ran into an error with almost everything I did with TR3 for Palm 3.1.10. Even hard resets made no difference. "SystemMgr.c, Line:185, Unimplemented" The error is listed on the web for all kinds of Palm applications, not specifically TomeRaider, so after 20 errors and so many soft and few hard resets I gave up.
Then I discovered Styletab Palm emulator for Pocket PC, and used this to make some screenshots.
See this (incomplete) comparison between TR3 PPC and TR3 Palm.
Itsme:
> More use of bold for chapter-headlines and a few more CR/LF between
> chapters for better readability.Now I know that TR3 for Palm is even less capable than I thought I can make some adjustments.
-
Itsme:
1
> Downloading the files without an download manager keeps the
> files intact and the checksum correct ..
> it would be an idea to mention it on your download-site.Thanks, done.
2
> Wishes for newer versions:
> More use of bold for chapter-headlines and a few more CR/LF between
> chapters for better readability. Examples for both are the chapters of cities
> (once more these ) and the different chapters in lists of date events
> (for example "24. März").Palm version? PPC version shows lines between headers using css, just like default online Wikipedia skin (monobook). Can you post a screenshot?
3
> And last but not least: Is it possible to create a database where you can find > words with an "Umlaut" at the beginning the same was as others (without > regarding upper/lowercase) ?TR3 searches index case insensitive (I explicitly asked for this as otherwise half of links would not work). But index lookup apparently is case sensitive.
Chances that Yadabyte will change this upon request are close to zero. -
Evolino:
> Now there are 2 possibilities:
> - Faster because there are no aliasses anymore
> - Faster because it was sorted already during compilation
> What do you think?You're abolutely right. So I prepared another version for test, one without aliasses and without sort during compilation.
http://www.infodisiac.com/Wikipedia/DownloadFiles.html
Palm users please compare new WP_DE_PALM_TXT3.tr3 with earlier versions:
WP_DE_PALM_TXT.tr3 aliasses, no sort
WP_DE_PALM_TXT2.tr3 no aliasses, sort
WP_DE_PALM_TXT3.tr3, no aliasses, no sort
If index lookup on WP_DE_PALM_TXT3.tr3 is slower than on WP_DE_PALM_TXT2.tr3 that would lookup speed is influenced by sort, and leaving out aliasses has no effect on performance.
All people so far asked to reintroduce aliasses anyway, but it would be good to know that lookup performance does not suffer.
Besides performance one could also check it the problem with not finding some articles through the index is correlated to sort or no sort.
-
b) Most Town-Tables still missing, .........
c) Still errors opening articles ..........Sorry if I was not clear about this. I only created new versions without aliasses, the ones with alliases are weeks old, with mentioned bugs, and only still available for comparisons. Please try the ones without aliasses for bugs fixed.
-
New editions are online.
Changes:
1 Bug fixes and some workarounds as discussed earlier
2 No aliasses
3 Input sorted by TR3 during compilation.Make sure to refresh the page (F5), you should see a version with and without aliasses, both for PPC and Palm.
http://www.infodisiac.com/Wikipedia/DownloadFiles.html
Please comment on differences in index lookup for editions with and without aliasses, both for PPC and Palm.
Also please comment on the reported index lookup error on Palm, where some pages could not be found. I'm pretty sure the sort by my script is actually correct, but who knows what TR3 does while sorting the index again (apart from spending extra time).
-
For now I will finish the job as started, otherwise I'll keep restarting when someone finds something to fix. As said, people can then test if it makes any difference. Then we know for sure. For later editions and the image version we can see again, if there is still consensus to leave aliasses in.
-
... this is a really cumbersome way of searching ...
That is exactly my point, I hope people who complain about speed can check whether this has been the cause.
For now I am generating text-only versions without aliasses. So that people can compare speed. Let us wait for feedback to see what is best for image version.
-
I removed aliasses for the next release to see if that helps. If not, it is beyond my capacity to do somethings about it.
There is one other thing, possibly related (?) : in the newest Tomeraider release one can open a search edit box while an article is still being displayed. The normal procedure was to switch to index view and then type a search entry. My impression is that search while displaying the index is faster, than search while displaying an article. Possibly because in both modes the display is refreshed while one types, and refreshing for an index is of course much faster than switching to an new article.
-
The missing texts are shown, but not as links, just plain texts. At least the article is readable again.
-
All: thanks for feedback. I will generate new text-only files in coming days. Actually I almost had them ready and uploaded till Elch33 reported serious bugs.
Elch33: I fixed both issues. I never saw links to the same page before. Too bad TR3 does not support links to a section within a page. (#... links)
Itsme:
Tables for small cities will now be rendered better but not completely correct. I fixed some minor bugs, but the underlying issue is more far-reaching and will become more of an issue in the future.Mediawiki introduced the current set of parser functions about a year ago. It is kind of a macro language.
http://meta.wikimedia.org/wiki/ParserFunctions
I added support for most of these functions, but not yet for #expr and #ifexpr, both handle possibly very complex calculations. Up till recently these functions were hardly used, but now more hacks in templates uses calculations.
Mediawiki parser function hacks are generally considered very ugly code, very inefficient, and hardly maintainable. I expect they will be replaced again (this is already a complete rewrite from earlier attempts) by something completely different. So I am not putting more effort it this anytime soon.
This means that code using #expr and #ifexpr can break the rendering.
So far this happens only on a few templates. Still 99% of all templates are parsed correctly. But these few exceptions are all templates that are often used (only then was it worthwhile to add such complicated code). For these cases as far as I could find them I provided some workarounds (sometimes stripping, sometimes replacing part of the template).An example of clever but ugly code is: putting a dot on a map of Germany based on longitude an latitude coordinates. In other extreme cases nested longitude and latitude calculations can invoke up to 300 parser functions just to get one result. The same guy designed the UI-wise utterly hopeless and inconsistent wiki-table syntax, which almost everyone dislikes now. Of course that is no excuse for lack of support in my parser, time constraints are an issue though.
Example for Koblenz:
http://de.wikipedia.org/w/inde…n_Deutschland&action=edit
which invokes
http://de.wikipedia.org/w/inde…lage:Lageplan&action=edit -
I'm going to Wikimania in Taipei today, so I will not be able to answer questions for about a week.
-
Itsme, the only content that I filter out on purpose on en: and fr: is all kinds of self referential housekeeping messages like "This article needs more references" or "This article is a stub, you can make it better". No other filtering is intentional. I have not done even that yet on de:
I checked Koblenz and Ludwigshafen, clearly a parser error, some of the underlying table syntax is shown. Those tables are built with templates so if a template is not parsed well, it immediately means all occurrences of that template shows the same defect.
I will look into the missing cities when I start working on the image edition, but that won't be this month anyway.