Alternative Databases & Puresim

PureSim Baseball is the ultimate baseball fan's toy, with support for both casual and hardcore baseball fans.

Moderator: puresimmer

Post Reply
BryanK
Posts: 140
Joined: Sun Aug 21, 2005 4:25 pm

Alternative Databases & Puresim

Post by BryanK »

There are a couple variants of the Lahman database floating around. The GamboAnakit database in particular starts with the Lahman database, but it adds a bunch of neat little stuff like adjusting players debut dates to account for minor league service, adding players from the negro leagues (I think), splitting players who had careers as both pitchers and hitters, etc. But the database also normalizes the stats. I was curious if this would break Puresim's import algorithims. Based on what I know about the XML, I get the feeling that the normalization process undertaken in the database would cause some issues, but I figured I'd ask to be sure.

I'd go ahead and run a test, but at present, the database files exist only in the CSV flavor, but if I sat down and invested an hour or so I could build an Access database out of them for use with Puresim. Just wanted some input before doing so.


Excerpts from the database readme regarding normalization:

15. Standardized batting stats to a 790-run environment using a method
devised by Bill James. He uses this in his "The New Bill James
Historical Abstract" to understand a player's batting ability
in a more normal context. For example, if a player hits .238, with
10 HRs and 57 RBI while playing a pitcher's park and where runs are
hard to come by. Is this player any better than another player who
hits .265 with 12 HRs and 72 RBI? The player in question is Willie
Davis in 1965, where runs were at a premium and he played in
pitcher friendly Dodgers Stadium. The .265/12/72 RBI are Willie's
stats standardized were he playing in an average park and where the
league as a whole would score about 750 runs per season. I moved
this to 790 to more closely reflect the scoring rate over the past
10 years. If you want to read
more about it, see page 740 (Willie Davis comment) in James' book.
I did this to everyone, including pitchers.

19. Batting stats for position players are now based on career totals,
per 500 at-bats. All seasons, except the rookie season, will have
these stats. For their rookie season, they keep their original
stats if they had over 500 at-bats. If they had less, then the
numbers are 75% of their per-500 at-bats ratios.

20. Pitching stats are now based on a pitcher's performance relative to
the league and the year, taking their park factor in to account.
They were then totalled for their career and then based on a
standard universe in which pitchers struck out 6.5 per 9 IP, walked
3.5 per 9 IP, etc. These ratios are the MLB ratios for the past 10
years. In short, I did something similar to the pitchers like I did to the
batters, which is explained above in #15.

21. Batting and Pitching stats are now a players career average except
for their rookie season, in which case it is this:
For a pitcher: if a pitcher pitched at least 150 IP, he retains
those stats, else they are 15% worse than their career average.
For a position player: if they had at least 400 AB, they keep their
stats, else they are 15% worse than their career averages.
Amaroq
Posts: 807
Joined: Wed Aug 03, 2005 5:29 pm
Location: San Diego, California

RE: Alternative Databases & Puresim

Post by Amaroq »

I don't know, but I'm very interested to hear a response from anybody who has played with this - or Shaun, or you if you decide to test it!
BryanK
Posts: 140
Joined: Sun Aug 21, 2005 4:25 pm

RE: Alternative Databases & Puresim

Post by BryanK »

In short... we'll never know. Appararently converting the CSVs into an access database is a more involved task than I had anticipated. I went through and imported all of the CSVs into access and cleaned up the column headings, but Puresim didn't seem to like it. So I must be missing something.
puresimmer
Posts: 2117
Joined: Sun Jul 24, 2005 3:39 pm
Contact:

RE: Alternative Databases & Puresim

Post by puresimmer »

Where is this download? I'll take a look.
Developer, PureSim Baseball
User avatar
Steely Glint
Posts: 594
Joined: Tue Sep 23, 2003 6:36 pm

RE: Alternative Databases & Puresim

Post by Steely Glint »

Shaun, you can get it at http://www.bruceM.net - it's listed on the upper right - and also be sure to download a copy of the very useful Baseball Chronology while you are there.
“It was a war of snap judgments and binary results—shoot or don’t, live or die.“

Wargamer since 1967. Matrix customer since 2003.
User avatar
Steely Glint
Posts: 594
Joined: Tue Sep 23, 2003 6:36 pm

RE: Alternative Databases & Puresim

Post by Steely Glint »

You should also look at Ankit's alternate databases, which IMO are the best out there, which you can download at http://www.baseballmaelstrom.com/ankit/
“It was a war of snap judgments and binary results—shoot or don’t, live or die.“

Wargamer since 1967. Matrix customer since 2003.
Post Reply

Return to “PureSim Baseball”