Skip to content

Problem Database Update 2: This is going to be awesome

May 31, 2012

We’ve now had a few meetings and discussions about the physics problem database, and I now know enough to say with a healthy certainty that this is going to be an awesome collaboration.

First, I am completely amazed by the response I’ve gotten from people interested in helping with this project. Over 20 people responded to my initial interest form. And the talents these people are bringing are extraordinary. Here are just a few of the volunteers who responded:

  • Aaron Titus, one of the original co-creators of WebAssign
  • Shawn Cornally, creator of BlueHarvest, founder of ThThTh industries.
  • Alemi, founder of the Virtuosi and Python master
  • The eponymous Mylene, author of the great blog Shifting Phases.
  • Evan Weinberg, author of Gealgrobophyiculus, willing to work from the other side of the planet.

Truly, I’m astonished that we can assemble a crack team of international developers to work on this project—the internet rocks again, as always. But perhaps the coolest thing about our development team was how many people responded to say they didn’t know much about software development/web design/databases, etc, but they would love to learn and help out in any way possible. This really got me thinking—what if we could do this project in such a way that not only do we create this problem database, but the process we use to create this database can help actually help people to learn these skills?

This gives me a good idea for a mission for our new Global Physics Code Shop, not just to write software, but to bring people into the process of creating software and help them learn how to add to our efforts, which should make for better software. What better way to learn web development than by jumping into a real project that you will ultimately end up using?

I also think that experienced programmers will also benefit from trying to explain what they are doing so that it can be understood by others. I constantly find myself falling into the trap of learning just enough about a new language or tool to just “get by,” but if I force myself to then go back and try to explain it to someone else, I’m going to need to learn it much more deeply, and this will likely translate into insights that will lead to better code and ultimately, better software.

So that’s what we’re going to try to do. We’re going to try to document our work on this project as we go along in such a way that even the most novice user, who hasn’t done anything with database design, or doesn’t even know the basics of HTML can follow along with our efforts, and with enough dedication, make significant contributions to this project. This means that as we develop things, we’re going to try to do more than just put in a few comments. We’re going to try to create documentation (blog posts, screencasts, and whatever else we can think of) that tries to teach what we’re doing.

Description of v1.0 of Physics Problem Database

Here is the minimum set of functions we want the Physics Problem Database to be able to do:

  • Allow users to login and write their own problems in a variety of formats (multiple choice, free response, numerical response, etc).
  • Allow users to add solutions to a problem.
  • Allow users to add comments to a problem.
  • Allow users to add attachments (photos, pdfs) to problems, comments or solutions.
  • Allow users to tag problems and then search for those problems by tags.
  • Allow users to create a list of problems and then display those problems to be printed as an assessment to be given to a student.

There’s obviously a ton of other functionality we’d like to add down the road, like having students be able to log-in, keeping track of what problems have been used by what students, or being able to import and export problems into different formats. Hopefully all of these will be added in good time, but for now, we want to be somewhat agile and try to get something useful out the door as quickly as possible.


Based on the expertise of the people working on this project, we’ve decided to develop the database using PHP and MySQL. MySQL is the standard in databsae development (this will hold all of our problem information), and PHP is a scripting language that runs on a webserver and can dynamically generate webpages using the information stored in the database.

One other tool we will be using is Laravel, a PHP framework, which is essentially a library of code designed to make it easier to develop web applications. Laravel will make it much easier to things like access control, or really fancy stuff like Dropbox integration.

In addition to to work in PHP, we’re going to need to do a significant amount of work in HTML5 and CSS to layout the interface for the app, and probably also in Javascript and the JQuery framework to add any interactivity we want on the user’s machine.

Database schema

The first task we have to accomplish is developing a schema for our database that decides exactly what information we will be storing in our database, and defines the relationships between various objects in our database. In our most recent discussion following the weekly Global Physics Department meeting, we made some headway into planning this out, and then Alemi and Aaron stepped up with some excellent suggestions.

Andy and I had a quick conversation today to discuss a few more details (link to recording of that conversation). Based on all these suggestions, I then tried to pull everything together and work up a first draft of the schema in MySQL Workbench, a great free app that lets you design databases (and much more).

Here’s the link to the MySQL workbench document I created: PPD database schema.mwb.

Also here’s a pdf of the schema:

View this document on Scribd

And finally, here’s a short video where I try to explain the schema for the database and ask a few questions for your consideration.

Two quick questions I have after developing this schema:

  1. I’m not sure we need a many to many relationship between tags and users. Shouldn’t that just be a 1 to many relationship, as in 1 user can create many tags? We aren’t going to have tags associated with multiple users, are we? Or might be need to find all of the users who have used a particular tag?
  2. I’m a bit confused by why the attachments table has 6 different foreign keys created by MySQL Workbench—there are 2 for each relationship between problems, solutions, and comments. In our first iteration, I think every image you upload is going to be associated with either a problem, solution or comment. We won’t do any fancy stuff to try to re-use an image to be associated with both a comment and a problem, for example. So I’m wondering what happens to the problem and solution foregin keys if I upload an image associated with a comment. Will they just be NULL to indicate that there is no relationship?

Organization and next steps

We haven’t yet figured out exactly how we’re going to organize development. Ultimately, I think we will be breaking into teams that are working on various aspects of the project (backend programming, interface design, etc). Our first step is to finish setting up a repository on GitHub which will allow us to share the code for the database, and allow you to download a working copy to test out on your own computer, modify and resubmit for inclusion in the code we ultimatly deploy to be used by everyone.

Andy is already playing around with the Laravel framework and setting up PHP/mySQL on his windows machine, and he’s already created a short screencast that shows you how to get started with doing the same.

Once I get the GitHub Repository fully set up, I’m also going to set up a blog over there, and will move long posts like this to that page as well, so my readers who don’t want to have anything to do with this project don’t have be bothered by these posts.

That’s it, so far. I’d love to know what you think If you’ve got a lot of experience in coding, I’d love any suggestions or critiques you have for our work so far. If you don’t have any experience, and are interested in helping, I’d love to know what you think of our efforts so far and if you find our screencasts and explanations helpful. I certainly found it helpful to try to talk my way thought the design instead of just throwing it together.

18 Comments leave one →
  1. Andy "SuperFly" Rundquist permalink
    June 1, 2012 12:34 am

    quick update to the Laravel stuff: I made a simple app with a users table and a problem table. All I’ve checked out so far is the user authentication. It works quite well, though it took an hour to figure out that if you don’t name your field “username” in the database, you have to use an associated array when using the Auth::attempt function. But, it works now, and I’m feeling more and more confident with my laravel abilities.

  2. June 1, 2012 7:34 am

    John and Andy, this is really exciting! I have a bunch of problems I would love to include in the database. I am also interested in your idea about involving teachers who don’t currently code, but who are willing to learn and want to learn. I would like to get better at coding, for sure! (Right now I pretty much only can do the basics of vpython that I learned in the M&I class)

  3. June 1, 2012 9:43 am

    This sounds like an amazing project, but I guess like any non-coder, it seems a bit intimidating. I would like to at least continue to follow the progress of this, and offer my input where I can. I will look at the videos you embedded when I’m on a real sized computer 🙂

  4. June 1, 2012 12:00 pm

    Unless you want to search for users by tags, I’m not sure why you’d need tags associated with users at all. If tags are only used to identify / classify problems, then they should only need to be related to the problems. (I think. I’ve done a bit of relational database work before but I’m no expert.)

    • June 1, 2012 9:11 pm

      I think we want to make it possible for faculty to have their own sets of tags, in addition to the tags created by others and the canonical set, so I think we are going to need this relationship.

      • June 2, 2012 12:54 am

        Hi John, I think I see where both you and joshg are coming from.

        The reason for having a relationship between Users and Tags, as I see it, is to present the user with a shorter list of options when creating a problem. I’m imagining tags as a drop-down list, or some other “choose from” control. I don’t want that control to be populated with every tag in the system. I also don’t want to have to guess/remember what I called my tags (for example, by typing in the first few letters); I want to be able to browse them, which means you need a way to keep track of which tags I want to browse.

        Next point: should it be 1-many or 1-1. Am I understanding this right: as a user, I would have access to canonical tags and my own user-tags, but not others’ user tags? This has the advantage that I am not presented with an impossibly long list of tag options. It has the disadvantage that if I work closely with someone, we can’t share our tags.

        Under this scenario, yes, you would want a one-to-many relationship between users and tags. I can only use tags I create (aside from canonical tags), and tags I create can only be used by me.

        However, what if you allowed people to “subscribe” to tags instead of “owning” them? Anyone could create a user-tag. I could browse the list and decide if I want to subscribe to a tag that already exists (say, Class A Amps). If I don’t feel like being bombarded with someone else’s ideas, maybe I create my own Class A Amps tag (I should be automatically subscribed to a tag I create). There would be two tags with the same name (Class A Amps), but so what? In the browse dialog box, you could show the usernames of people already subscribed to that tag. If they are the names of my colleagues, I know I’ve found the right one.

  5. jsb16 permalink
    June 1, 2012 11:00 pm

    Depending on how the tag list comes up, I can imagine tagging a problem and seeing, say , “relativistic bananas” in the list of possible tags and wondering not just “What’s a problem that should be tagged that?” but also “Who writes problems like that and what other interesting problems have they written?”

    As an administrator/support-type-person, someone might also wonder “Is ‘cfvpom’ an intentional tag or a typo for ‘cvpm’?” and being able to ask the person who created it might save a great deal of frustration.

    • jsb16 permalink
      June 1, 2012 11:14 pm

      (The many-to-many relationship between tags & people is needed in case Kelly O’Shea decides to help me write problems involving relativistic bananas.)

      Am I correct in thinking that the intent of the comments is to allow meta-discussions of the problems? (Such as discussions of whether a problem is better used while working on the unbalanced forces model or saved until after the energy transfer model or left as an extension for the really bright kid who would otherwise be trying to figure out how to bypass the safety cutoff on the gas spigots.)

  6. June 2, 2012 1:28 am

    Hi John, a few more ideas. If this is jumping the gun into a later stage of decision-making, just put it on the back-burner…

    Your Point 2 above, foreign keys: the only reason I know of to add those extra foreign keys all over the table is to speed up queries. My suggestion would be to get rid of them for now, and create a normalized reference design. If you have future performance issues, you can add them back in. They create a risk of update anomalies (a form of database corruption), so they are best avoided.

    Point 3, referential integrity: I can’t tell if MySQL Workbench is recording the join-types, or if that happens in a later stage of the process. Have you made decisions about how/whether to enforce referential integrity? This comes up, for example, when a user decides they want to delete their account. Will all of their problems be deleted? What about tags they created, and comments they posted? The same thing will occur if users have the opportunity to delete a solution they posted. Do all the associated comments get deleted? Etc. This will become an issue again when you are querying the database, and have to decide whether to use inner joins (which return only the

    Point 4, support tables: It may be helpful to have small support tables for anything that you eventually hope to be able to “choose from a list”. I’m thinking of things like problem format, level, type, etc. Creating little tables that contain nothing other than “Format Id, Format Title” (maybe with a “notes” field”) can make life easier in the long run when the list gets long and slow to index, or you decide you want to “retire” a format without deleting all the problems associated with it, etc. The “institution” that users belong to might benefit from the same treatment. A GUI convention that shows a drop-down control with a button next to it simply labelled […] would allow people to easily add a new item to the table if none of the existing ones suited them, but could save space, cut down on frustrating missed results due to typos, etc.

    Point 5, style conventions: It looks like a solution has a “creation_date” as well as a “created_at” field. If this isn’t a typo, what’s the distinction? It’s probably going to be helpful to keep field names consistent with their parallels in other tables. Speaking of field names, any thoughts about naming conventions? Depending on the database platform, I’ve sometimes done things like use prefixes for object type (t for Table, q for Query, i for Integer, etc etc) if the development environment presents objects in lists where it’s possible to mistake one for another. Also, it’s nice not to have to continually look up the data type of a field, if that info can be conveyed in the name.

    Point 6, shared attachments: I think you addressed this in the screencast, but I wanted to be sure: as it is, a single attachment can not be used for multiple problems. I can appreciate the desire to reduce complexity. At the same time, uploading attachments is likely to be one of the bottlenecks in the user experience, and storing them is likely to be a bottleneck in resource requirements, so it would be great if attachments could be reused somehow (I realize that it creates a UI design problem, figuring out how to let people retrieve an existing attachment and “connect” it to a new problem, solution, or comment). On the upside, I don’t think you’d need 3 intersection tables — one intersection table could (I think) contain the attachment, and one of either a problem id, a solution id, or a comment id.

    • June 2, 2012 1:19 pm

      I’ve always hated it when the ITS staff delete user accounts, leaving me with files in the file system that have no owner. Even worse is when they reuse the numbers, so that the files are owned by the wrong person. Referential integrity should be maintained, but the best way to do that is *not* to delete accounts, but to mark them as obsolete. That way you don’t have dangling pointers and a big cleanup problem, but just a responsibility to make sure that obsolete records are not properly displayed (as “account deleted”, rather than giving the name of the user, for example).

      I agree with Mylène that attachments are going to present upload and storage problems unless you allow reuse. The WordPress “media gallery” provides a good model for handling this situation.

      • June 3, 2012 11:26 pm

        Adding a field to mark users as obsolete is a great idea. I bet that Laravel already has some sort of media management tool/plugin created. I’ll look around to see what I can find.

      • June 4, 2012 12:43 am

        A user may need/want to know that their info is no longer stored… they will experience that as a “deletion” (regardless of whether their info is gone or simply hidden). I know I’m wary of any online service that makes it difficult (or impossible) for me to delete my account (I’m looking at you, Facebook). I can even imagine a school administrator requiring a user to stop participating in this project (I can’t think of any *good* reasons, but I can think of some reasons).

        This is not incompatible with GSWP’s point. It is quite possible to remove identifying details from a user record (maybe replace the person’s name with “Deleted User” and leave everything else blank) while maintaining the record, and its key, intact. Their comments, problems, solutions, etc. remain available to the community, and it saves database management headache. (It also allows them to rejoin at a later date and resume ownership of their records). Incidentally, I would name the field “UserActive” (not “UserObsolete”), to reduce the probability of someone misinterpreting what “True” means in this case.

    • June 3, 2012 11:32 pm

      If I understand correctly, you’re saying we should pull the keys that reference two tables, like problems_Users_id foreign key in the comments table and comments_problems_User_id in the attachments?

      Haven’t thought at all about what to do when a user deletes their account. I was probably not going to implement the option in V1.0.

      Thanks for the the other tips and pointers. The “creation date” and “created_at” was a typo. Laravel requires it be “created_at” and I must have forgotten that change.

      I’m going to see if I can find some sort of library/plugin to make managing the attachment issue easier to manage.

      • June 4, 2012 12:56 am

        Hi John,

        Re: keys — I hesitate to agree with that summary. It’s true in this case but not true in all cases. The take-away is this: a table needs the primary keys of the tables it is connected to. Comments is connected to Problems; so the primary key of Problems has to become a foreign key in Comments. Comments is also connected to Users; so the primary key of Users has to become a foreign key in Comments.

        In each case, the primary key of one table becomes a foreign key in an associated table. There is no need for a foreign key in one table to become an (even more) foreign key in another table.

        Hm. It makes me wonder if MySQL workbench knows which one of those keys is primary. I don’t see any graphical indication of which key is the unique ID for the table — is there some way to record that? If not, the software may be intepreting the keys as being a single compound key, which might explain why they all show up in the intersection tables.

        • June 4, 2012 3:42 pm

          Thanks so much. All of the id keys in the various tables are marked by MySQL workbench as primary keys (I think it does this automatically to the first thing you add to the table).

  7. Aaron Titus permalink
    June 3, 2012 8:25 am

    Installing Laravel on a Mac was a bit of a challenge. It requires mcrypt which is not compiled into PHP by default. I’m using OS 10.6 (Snow Leopard), but it mcrypt is not installed with PHP on 10.7 (Lion) as well.

    I found the following web pages helpful:

    Furthermore, when copying php.ini.default to php.ini, my php pages were no longer displayed. (i.e. they were displayed as pain text and not processed by the PHP scripting engine). I commented out all lines in the php.ini file (literally it does absolutely nothing), and then it worked. I don’t know why. But now shows that mcrypt is installed.

    Next, the Laravel installation instructions are fairly vague. It says, “Extract the Laravel archive and upload the contents to your web server.” That sounds really easy! Except that it gave an error in one of the php files that needed write access to the laravel/storage folder. So, I gave all folders in laravel/storage write privleges by the web server.

    Now, laravel/public/index.php gives the appropriate splash screen in the web browser.


  8. Aaron Titus permalink
    June 3, 2012 8:28 am

    I have a question for Andy (or anyone else who as installed Laravel). It seems like only the Laravel public folder should be in one’s document root (for the web server). Where should the other Laravel folders go? Right now, I have the entire Lavavel folder in my web documents as instructed by the Laravel documentation, but that doesn’t feel right to me.

    • June 4, 2012 3:42 pm

      If I understand it, Laravel want you to put none of it in the web folder, but then put in an alias in the web folder, pointing to the Public folder. On my pc, I just did what you did, but on the server we should think about this.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: