Things I am doing. Desktop application

feedback, questions and discussion relating to the Complete BBC Games Archive (beta site now open!)
User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Thu Mar 01, 2018 11:14 pm

I am using an HTML table. That is why I put "spreadsheet" in quotes. I use a python library (html.parser) to parse it, (Into an sqlite database as it happens) Lee opens it with a spreadsheet program. I think html is more flexible than sqlite, You could parse it with grep and sed, or even Basic. To save myself work I would like to have one database extract if at all possible.

crj
Posts: 846
Joined: Thu May 02, 2013 4:58 pm
Contact:

Re: Things I am doing. Desktop application

Post by crj » Fri Mar 02, 2018 12:20 am

pau1ie wrote:I am using an HTML table. That is why I put "spreadsheet" in quotes. I use a python library (html.parser) to parse it, (Into an sqlite database as it happens) Lee opens it with a spreadsheet program. I think html is more flexible than sqlite, You could parse it with grep and sed, or even Basic. To save myself work I would like to have one database extract if at all possible.
Normally, by choosing an HTML table you'd be heading for a world of hurt with content encodings. But I guess almost everything to do with old Acorn kit will be flat ASCII. Just make sure pound signs survive every translation and transmission step. (-8

sqlite is rather better at giving you back the bytes you put in.

crj
Posts: 846
Joined: Thu May 02, 2013 4:58 pm
Contact:

Re: Things I am doing. Desktop application

Post by crj » Fri Mar 02, 2018 12:30 am

pau1ie wrote:You could parse it with grep and sed, or even Basic.
Hmm. That's set me thinking that if you had something more modern attached to a Beeb it would be pretty easy to give BBC BASIC a nice interface to sqlite...

Code: Select all

*attach mydb path.to.my.database/sqlite
X=OPENIN("sqlite:SELECT author,game,filename FROM mydb.games")
REPEAT
  INPUT#X,author$,game$,filename$
UNTIL EOF#X
CLOSE#X
*detach mydb
...or something like that. (-8

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Fri Mar 02, 2018 1:06 pm

crj wrote:by choosing an HTML table you'd be heading for a world of hurt with content encodings.
The bbcmicro.co.uk/ss.php page is in UTF8 according to my broswer. As I say, Lee already uses it, so it has to work. The only title I had issues with was:

Secret Diary Of Adrian Mole Aged 13¾, The

That works fine, though I think I had to tell the browser the page is UTF-8 to get it to work properly. The ¾ is UTF-8 code C2BE. I suppose that an 8 bit computer would have problems with that, so I would have to consider how to deal with it if I ever get round to building a menu. } in mode 7, or change it to 3/4, or just lose any non 7 bit ASCII characters. But that is way in the future.

Sitting at work as I am now, html table wins because it can be opened in Excel. I am not sure this would be possible with Excel. Also, both Excel and LibreOffice Calc can open the table directly from the web page, which is nice.

crj
Posts: 846
Joined: Thu May 02, 2013 4:58 pm
Contact:

Re: Things I am doing. Desktop application

Post by crj » Fri Mar 02, 2018 4:14 pm

As I say, you might be OK in this application.

However, I would ask: is this the same title? "Secret Diary Of Adrian Mole Aged 13³⁄₄, The"

Also, I would ask how you cope if you fetch the page in a hotel room and the hotel's "transparent" proxy transcodes to iso-8859-1.

Also, do any titles contain double-spaces? If so, it would be prudent to transform the extra spaces into en spaces, then convert them back afterwards. Giving consideration to whether it's necessary to cope with the   HTML entity as well as the bare Unicode character.

Also, in a retrocomputing context, how confident are you that any data you store as HTML will be interpreted the same way by tools available in 2048?

HTML is a format for presenting information to a user, not for robust, durable storage. Personally, I'd want to be sure of getting all my bits back, so would use sqlite or similar. /-8

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Fri Mar 02, 2018 5:00 pm

crj wrote:is this the same title?
That is a philosophical question.I think I am going to say I don't care. I suspect Lee typed 3/4 into a spreadsheet and it was auto-corrected to ¾. Does that matter? Would it have been better to leave it? Would ³⁄₄ have been better? I don't think I mind. For my purposes they are the same. (For those following along, the original has one character, crj's example has three. Try highlighting it.
crj wrote:iso-8859-1
I expect your point is that (Depending on how the proxy mangles things) it will end up looking like this.

Secret Diary Of Adrian Mole Aged 13¾, The

You often see this type of problem I would prefer to use https to stop the proxy being able to mess with the content I am delivering.
crj wrote:Also, in a retrocomputing context, how confident are you that any data you store as HTML will be interpreted the same way by tools available in 2048?
Extremely confident. I think it is similar to my being able to open an ascii text file from my bbc micro in notepad now. HTML is pretty much the most common file format, it is widely documented, open and easily accessible. UTF-8 is the most common encoding. The sqlite database it is likely to have changed format between then and now, so you will have to find an old version of the code and compile it to be able to read it. Doable, but I am confident that UTF-8 encoded HTML will "just work".

crj
Posts: 846
Joined: Thu May 02, 2013 4:58 pm
Contact:

Re: Things I am doing. Desktop application

Post by crj » Fri Mar 02, 2018 5:42 pm

pau1ie wrote:Extremely confident. [...] notepad
Funny you should mention Notepad... that's natively UTF-16 and can cause a lot of damage in transcoding UTF-8 back and forth.
I suspect Lee typed 3/4 into a spreadsheet and it was auto-corrected to ¾. Does that matter?
To me it does, because my hope is that somebody, somewhere, will have a definitive archival-grade list. It would be a shame to have something that only got 99% of the way there, because a lot of effort would have to be duplicated before anybody could start working on that last 1%.

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Fri Mar 02, 2018 8:43 pm

crj wrote: definitive archival-grade list
This is absolutely not what we are trying to achieve. I do try to be as open with what we are doing as possible. In practice I think the HTML table isn't as bad as you are concerned it might be. Lee uses it to maintain the site, so it feeds back on itself.

The other thing that might help is the mysql dump which I update by hand periodically. This is probably closer to what you want. If you build something that uses it, I will investigate generating it on demand.

It is interesting (Though somewhat academic as I am not interested in going down that route) to wonder what an archival grade list would mean. Is the title what was written on the front of the cassette, or what was displayed when the program was run, or what was displayed in the adverts. Presumably where different all of these would have to be logged. I think anyone doing this would find they have more than 1% to do, though Lee has done amazing work with the bbcmicro site metadata.

crj
Posts: 846
Joined: Thu May 02, 2013 4:58 pm
Contact:

Re: Things I am doing. Desktop application

Post by crj » Sat Mar 03, 2018 12:01 am

Being enough of a Douglas Adams fan to have seen the Hitch-Hiker's v. Hitch Hiker's v. Hitchhiker's confusion, I do appreciate that difficulty. Though IMDb does try to preserve all variant names of films, and BoardGameGeek does the same for board games. It would be good if someone, somewhere was doing the same for computer games.

Then again, I'm not volunteering, so I know I can't complain. (-8

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Sun Jan 13, 2019 11:28 am

Must be something about winter that made me think about this again, and the fact that as things stand it is impossible to link the screen shots in the archive download file with a game entry. So I am revisiting this, and have more or less unilaterally decided that I will use the following scheme for file names inside the zip file:

Code: Select all

t/title-id.ext
Where:
  • t is the first character of the title upper-cased (Or zero if it is a number)
  • title is the title of the game without the article truncated at the first (, [ or , with all non alphanumeric characters removed.
  • id is the id. This is what I really want to have in the file name so they can definitely be uniquely identified.
  • ext is the extension
I know some people wanted other stuff, in particular publishers, but I don't understand, given that we have the spreadsheet (ss.php), why this would be useful. It would lead to long filenames and the names would be mangled by applying the above rules anyway.

Speaking of the spreadsheet, I intend to include that in the file as well to make sure that whenever it is downloaded all the metadata comes with it, and all the work that goes into producing the site won't be lost if it drops off the internet. I'll probably include a readme with some waffle and the generated date as well. The spreadsheet currently has a file name column, but those are the names on the server not in the archive, so I will include another column containing the file name in the archive.

I have more or less coded this. If it will cause a problem to you in any project you have, please let me know and I will try to accommodate you, but I am keen not to let the perfect become the enemy of the good.

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Thu Jan 17, 2019 12:42 pm

That is all done now.

User avatar
leenew
Posts: 3936
Joined: Wed Jul 04, 2012 3:27 pm
Location: Doncaster, Yorkshire
Contact:

Re: Things I am doing. Desktop application

Post by leenew » Mon Jan 21, 2019 9:58 pm

Thanks Paul,
Looks to be working well 🙂

Lee.

User avatar
shawty
Posts: 133
Joined: Sun Feb 03, 2019 3:03 pm
Location: North East England
Contact:

Re: Things I am doing. Desktop application

Post by shawty » Mon Feb 04, 2019 10:35 pm

pau1ie wrote:
Mon Feb 26, 2018 2:09 pm
I am working on a desktop application for the bbcmicro website. At present it doesn't do much, but the plan is for it to download the website and give similar functionality. At present it downloads what I call "Lee's spreadsheet" at bbcmicro.co.uk/ss.php and populates a local database which is then used to generate a screen which looks pretty much the same as the bbcmicro website. The next thing I want to work on is to add the screenshots, and the disc images, then I will be able to get it to fire up the games in an emulator.

It will also need polishing. At present while downloading and parsing the spreadsheet it just hangs, and the operating system reports it is not responding. I ought to put some progress bars in there and use threads so the main application doesn't hang, but I don't know what I am doing - I am learning all this as I go.

Longer term I would like to be able to create menus and disc images for the MMB (Micks images would be better for this) and datacentre with menus generated from the metadata in the database. This will take years though.

Anyway I thought I would post to provoke discussion. I am writing it in python and pyqt so it can run on Windows and Linux, so hopefully other operating systems will also work. I should upload it to github so anyone can see it. Is anyone interested in it before it works properly? Can anyone else help, support, advise, encourage or even heckle?

Any ideas for a name?
I've not read all the thread YET :-)

What's your target platform? Just the BBC? or are you thinking bigger?

If your thinking cross platform EG: WIndows, *nix and Mac you'd be better off using something like Electron or even Googles new XPlat framework "flutter", that would then allow you to

A) Embed the views in your app using HTML
B) Deliver the data from the site via a mechanism like JSON, in a structured format rather than just dealing with flat files.

If you only want to support windows, then a standard .NET windows forms app with an embedded browser using the CefSharp toolkit will highly likley make things much easier for you.

PM me if you wanna have a chat about details.
Been around I.T. since the Beeb was born, maybe you've seen me around :D

User avatar
pau1ie
Posts: 690
Joined: Thu May 10, 2012 9:48 pm
Location: Bedford
Contact:

Re: Things I am doing. Desktop application

Post by pau1ie » Mon Feb 04, 2019 11:01 pm

I was targeting Linux, Windows and probably mac, though I won't ever buy one, so I just hope that will work due to the cross platform nature of python. I haven't really considered mobile.

I am using pyqt as it is one of the nicer cross platform toolkits, and I like the fact it is free/open source. Also I have wanted to learn it for a while.

It is kind of stalled at the moment, as I needed to change the website to make the file names for the screenshots and the disc images match up with the database. That is done now, but I haven't restarted work on this, though I expect I will at some point. I consume data from the website in an html table (bbcmicro.co.uk/ss.php). It is a compromise as this is used by a human (Lee) to populate a spreadsheet, and I wanted to repurpose it for this. This is where you start to wish you had used a proper framework for the website, but then those tend to be quite resource hungry, so might not have worked as well for what it does.

Thanks for your interesting ideas!

User avatar
shawty
Posts: 133
Joined: Sun Feb 03, 2019 3:03 pm
Location: North East England
Contact:

Re: Things I am doing. Desktop application

Post by shawty » Mon Feb 04, 2019 11:12 pm

Ok, let me have a think about the desktop app, I can already think of some better ways, possibly using C# as the language (All open source and cross platform now by the way... MS doesn't own it anymore, they handed it all over to the .NET foundation) ... back on track.

Let me have a think about things, and I'll throw some ideas your way, while you tinker on with the website.

In terms of the website, given that your using PHP, you might want to take a look at "Laravel" if you want an easy to use web application framework that will be good to use. Ton's of documentation and tutorials, and a massive community.
Been around I.T. since the Beeb was born, maybe you've seen me around :D

Post Reply