Lahman Import Errors?

PureSim Baseball is the ultimate baseball fan's toy, with support for both casual and hardcore baseball fans.

Moderator: puresimmer

Post Reply
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

Lahman Import Errors?

Post by KG Erwin »

For designers of historical season templates: this is worth mentioning again. It seems that certain players (outfielders) import from Lahman in incorrect field positions. So, now, when I create a template, I got back through each team to make sure the primary/secondary positions are historically correct. For my purposes, if a player started more than 10 games in a position, then I make that a secondary position.

I use the fielding information from http://www.baseball-reference.com (assuming, hopefully, that it is accurate).

That being said, my previously released templates available on Padres Fan's site may NOT be 100% correct.

Another thing to note: I do NOT change any OF ratings manually. I let the game do any adjustments and re-evaluate the players.
Image
User avatar
Frozen Stiffer
Posts: 1059
Joined: Fri Aug 19, 2005 8:18 pm
Location: California, USA

RE: Lahman Errors

Post by Frozen Stiffer »

ORIGINAL: KG Erwin

Another thing to note: I do NOT change any OF ratings manually. I let the game do any adjustments and re-evaluate the players.

KG,

Pardon me if I am missing the obvious, but... if you don't do it manually, how DO you do it? Do you drop the player in the appropriate outfield spot let him 'learn' the position? Do you change the position, but none of his ratings?

I apologize in advance if I missed something.
"It ain't braggin' if you can do it."

-Hall of Fame pitcher Jerome 'Dizzy' Dean
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

I simply change the positions in "edit bio", but I don't change the ratings numbers. The game changes the numbers automatically.

I AM uncertain about how many games a player should play a given position before making it a secondary one, though. 10 games is an arbitrary number, and probably could be adjusted down.

What's really odd is that some players import correctly for a given year, but others differ greatly. This in turn makes me wonder how accurate the actual stats are. [&:]

Edit & addition: before you guys start thinking I'm overly obsessive about this stuff, the reason why I do this is to make sure that the autogenerated AI-controlled lineups are reasonably accurate. There are many mistakes in the Lahman database, and the incorrect position listings can make a big difference in the course of a season.
Image
User avatar
GNDN
Posts: 179
Joined: Mon Jun 25, 2007 1:50 am
Location: Albany, NY

RE: Lahman Errors

Post by GNDN »

I am curious, too, since I noticed it in my replay association.  After I finish my season (only the WS to go!  Woo Hoo!), I planned on fixing my players.
Nobody leaves this place without singing the blues....
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

Ok, what I'll do is restart an historical replay association from scratch, and illustrate the differences in the lineups of a given team. This will take a bit of time, so bear with me. [;)]

Here's how the 1947 Dodgers imported, BEFORE spring training:



Image
Attachments
UO0001.jpg
UO0001.jpg (157.67 KiB) Viewed 253 times
Image
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

Now, here's the actual positions played (LF-CF-RF only) for the '47 season:

-----------+---+----+----+---+---+-----+--------+---+---
DWalker 147 261 9 10 0 .964 1.84 0 0 147
CFurillo 121 287 9 7 3 .977 2.45 28 93 2
PReiser 108 240 3 3 0 .988 2.25 51 62 0
GHermanski 66 105 5 2 0 .982 1.67 64 1 3
DSnider 25 48 0 1 0 .980 1.92 4 13 7
AVaughan 22 49 0 0 0 1.000 2.23 22 0 0
AGionfriddo 17 29 1 2 0 .937 1.76 11 0 6
EMiksis 11 25 0 0 0 1.000 2.27 11 0 0
DLund 5 11 0 0 0 1.000 2.20 5 0 0
TBrown 3 4 0 1 0 .800 1.33 3 0 0
TTatum 3 2 0 0 0 1.000 0.67 2 0 1
DWhitman 3 7 0 0 0 1.000 2.33 2 0 1
MRackley 2 7 0 0 0 1.000 3.50 0 1 1
-----------+---+----+----+---+---+-----+-----+---+---+---

Note that Al Gionfriddo is rated in-game as a CF, whereas in reality he played 11 games in LF and 6 games in RF. Dixie Walker is rated as a RF and LF, but played 147 games in RF only. Many other teams have these inconsistencies. For the Dodgers, it's not that big of a deal, but for others, it IS a big deal.
Image
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

After making historical adjustments and autoarranging the lineups, the team looks like this:





Image
Attachments
UO0002.jpg
UO0002.jpg (157.68 KiB) Viewed 257 times
Image
User avatar
GNDN
Posts: 179
Joined: Mon Jun 25, 2007 1:50 am
Location: Albany, NY

RE: Lahman Errors

Post by GNDN »

OK, I think I see what you are doing....when you first mentioned you did this, I assumed you went into the DB and modifed each player's record with games played at each position.  Not an exciting prospect.
 
But you do not go into the DB, you simply edit the bios in PS and let the AI make the changes.  Can I assume that you use baseball reference to come up with the number of games at each position? 
 
 
 
 
Nobody leaves this place without singing the blues....
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

ORIGINAL: GNDN

OK, I think I see what you are doing....when you first mentioned you did this, I assumed you went into the DB and modifed each player's record with games played at each position.  Not an exciting prospect.

But you do not go into the DB, you simply edit the bios in PS and let the AI make the changes.  Can I assume that you use baseball reference to come up with the number of games at each position? 



Yes, I use Baseball Reference as my guidepoint. More must be said -- perhaps individual point ratings should be adjusted, but I don't have the expertise to make those judgments. For the Dodgers, the historical changes made perfect sense. Carl Furillo didn't play until late in the year, and the primary LF for most of the season was Gene Hermanski.

What must be made clear is that none of this can be blamed on PS. The errors are contained within the Lahman database. It IS a pain in the ass to go thru each team and make them accurate, but if you wanna play out a series of seasons, these errors add up.




Finally, it must be said that if you wanna use real players, you're gonna have to do some extra work. Players who changed positions will have to be treated differently, as mentioned elsewhere. Judgments must be made on "how historical do you wanna be"? I have my own ideas about alternative history. In my opinion, the smallest change can have vast repercussions on the timeline. That's how I approach the game.
Image
User avatar
jeremy7227
Posts: 161
Joined: Tue Jan 24, 2006 5:45 pm
Contact:

RE: Lahman Errors

Post by jeremy7227 »

KG -

The issue is in PS. I think this is actually a good thing becaus eit is something that can probably be fixed rather than a DB issue which would require some real heavy lifting.

Look at the attached screen shots. In the LahmanPSDB tabel "Fielding" Gene Hermanski is player "hermage01" and his position if listed as "OF". But there is another table in the Lahman DB called "FieldingOF" In this table the OF are listed with their breakdown by position. I suspect, but I have no way of confirming, that PS looks only in the "Fielding" tbale and not int he "FieldingOF" table. When importing players whose primary position is "OF" the import function should perform a secondary look up in the "FieldingOF" table to determine the player's primary position.

Hope that helps and maybe it is something Shaun can address.

Jeremy


Image
Attachments
Fielding_Table.jpg
Fielding_Table.jpg (74.61 KiB) Viewed 253 times
User avatar
jeremy7227
Posts: 161
Joined: Tue Jan 24, 2006 5:45 pm
Contact:

RE: Lahman Errors

Post by jeremy7227 »

And here is "hermage01" in the "FieldingOF" tbale in Lahman...



Image
Attachments
Lahman_FieldOFTable.jpg
Lahman_FieldOFTable.jpg (116.99 KiB) Viewed 253 times
User avatar
jeremy7227
Posts: 161
Joined: Tue Jan 24, 2006 5:45 pm
Contact:

RE: Lahman Errors

Post by jeremy7227 »

Also, should be noted that Baseball Reference was built with Lahman DB as its foundation and that was built with RetroSheet as its foundation. So the data is all there for us it just needs some massaging. I think we can do more with Lahman DB to customize the experience when playing PS without using PS to make the edits. I have been experimenting with modifying the DB but honestly haven't had the time to give it as much attention as I would like.

Some things, like the issue you outline above are things I think the game should address. Others are areas where I think customizing the DB would create a more accurate import of some players.
User avatar
GNDN
Posts: 179
Joined: Mon Jun 25, 2007 1:50 am
Location: Albany, NY

RE: Lahman Errors

Post by GNDN »

KG, thanks for the information.  Since I plan on going from a 2007 replay straight into 2008 and beyond, I am going to use MLB.com to grab my numbers.  I will keep this in mind however since I was thinking about re-playing the Mets from their first season.
Nobody leaves this place without singing the blues....
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

Jeremy, thank YOU for your insights into the Lahman database. So, this is an issue that I hope Shaun will address, as it definitely affects users of real players. For now, I suppose it's back to manually editing the rosters. In a 16-team association, it's not a huge undertaking, but after expansion sets in, it becomes more problematic.

(Personal note: as I mentioned to a fellow gamer, I feel somewhat uncomfortable in criticizing a game that I love, but in this case, hopefully it is constructive criticism.)
Image
User avatar
jeremy7227
Posts: 161
Joined: Tue Jan 24, 2006 5:45 pm
Contact:

RE: Lahman Errors

Post by jeremy7227 »

KG - I agree I don't mean this as a knock on PS. I play this game for hours a week when I can and I am only hoping that by pointing out possible ways to stream line player import the game will be able to achieve an "out of the box" accurately playable experience with real players for folks who don't have the experience or patience to make the manual edits. If, like us, you play a lot and see the power of the game you don't mind doing some of the manual set up. But new users might see it out of context as a major glitch - it isn't - but perception is reality.

Also - to GNDN - I don't think you will find this issue in recent seasons with real players. The Lahman DB does list the primary OF position for players in the 1990s and 2000's at their actual OF spot. For earlier seasons the DB only lists OF on the Fielding table (see above) but the POS breakdown is in the FieldingOF table. So this is only an issue if you play back in the earlier eras.

J
User avatar
GNDN
Posts: 179
Joined: Mon Jun 25, 2007 1:50 am
Location: Albany, NY

RE: Lahman Errors

Post by GNDN »

Jeremey, thanks.
 
You just saved me a few hours of editing.  Your next beer is on me.  [8D]
Nobody leaves this place without singing the blues....
User avatar
dneely
Posts: 207
Joined: Sat Aug 20, 2005 12:03 am

RE: Lahman Errors

Post by dneely »

KG:

Don't worry! Nothing you are saying is wrong or more importantly meant to hurt Shaun or his game. The game still needs improvement in various areas and the Lahman database is not perfect. You/we should post our observations, suggestions and ideas as they come forth. I have never had any feeling that Shaun is mad when solid and thoughtful comments are made by obvious fans of Puresim such as yourself!
DNeely

PureSim Vet
User avatar
KG Erwin
Posts: 8366
Joined: Tue Jul 25, 2000 8:00 am
Location: Cross Lanes WV USA

RE: Lahman Errors

Post by KG Erwin »

Thanks, Dave.  I'm working on a corrected 1947 template, which I will submit for inclusion on Padres Fan's site.
Image
User avatar
dneely
Posts: 207
Joined: Sat Aug 20, 2005 12:03 am

RE: Lahman Errors

Post by dneely »

Glenn:

Please let me know when your new 1947 template is ready to go!!
DNeely

PureSim Vet
Post Reply

Return to “PureSim Baseball”