The following is an article written by guest author Bob Coret and is copyright by him. The article is published here with the permission of Bob Coret:
Don’t want to lose (parts of) your genealogical data?
A recent research report by Genealogy Online shows that genealogists have a high risk of losing (parts of) their genealogical data when transferring a GEDCOM file from their family tree program or service to another family tree program or service. This is caused by the fact that most family tree programs and services do not follow the GEDCOM specification to the letter and because a lot of undocumented “user-defined tags” are used.
Recently, Nigel Munro Parker, made his GEDCOM validator GED-inline [http://ged-inline.elasticbeanstalk.com/validate] available for re-use. GED-inline reads a GEDCOM file and checks if the file follows the rules of the specified GEDCOM specification. You get a report nearly instantly (and free). Besides statistics it shows the number of warnings and user-defined tags, as well as a list of all warnings. Genealogy Online (a service for easily publishing your family tree online) recently deployed the open-sourced GED-inline in its infrastructure. Genealogy Online [https://www.genealogieonline.nl/en/] now checks all GEDCOM files it receives to publish online. When there are warning in regards to the GEDCOM file, Genealogy Online notifies the user.
In order not to lose genealogical information when it is transferred from “A” to “B”, agreements on how the information is recorded are of great importance. If both “A” and “B” adhere to these agreements, then the information will come across properly – without loss of information! Agreements about the format of genealogical information are laid down in the GEDCOM specification. The most recent GEDCOM version is 5.5.5, which is published on http://www.gedcom.org [https://www.gedcom.org/].
As a genealogist you do not have to dive into these GEDCOM specifications. The specifications are intended for the suppliers of family tree programs and services (more specifically, their developers). But as a genealogist you should make sure that the GEDCOM function of your family tree program or service adheres to the GEDCOM specifications! After all, if a family tree program or service does not adhere to the GEDCOM specifications, then there is a risk of information loss during the transport of the genealogical information!
As a genealogist you can check the quality of your GEDCOM too! If you’re not using Genealogy Online, just go to GED-inline [http://ged-inline.elasticbeanstalk.com/validate] directly and upload your GEDCOM. See how many warnings are in the validation report. The number of warnings says nothing about your genealogical information, you didn’t do anything wrong. The warnings relate to compliance of the GEDCOM file with the GEDCOM specification. If there are warnings, there is a good chance that the GEDCOM file will not be fully understood by another family tree program or service and that there is a risk of information loss!
Another number that you should pay attention to in the GED-inline report is the User-defined value. This number represents the number of lines in the GEDCOM file where a so-called user-defined tag is used. Such tags are valid within GEDCOM, but the meaning of this is not laid down in the GEDCOM specification. And often, these use-defined tags are not documented publicly. So if program “A” places a certain information in a user-defined tag, chances are that program “B” does not know what information it is and what it should do with it. In a best case scenario these values are included as a comment, in the worst case scenario, these values are ignored. So, the user-defined tags also increase the risk of information loss.
Genealogy Online’s ‘GED-inline validation statistics’ [https://www.genealogieonline.nl/en/GED-inline/] report show that 1,215,130,449 lines of GEDCOM were inspected, 8,129,466 warnings were given (that’s 0.7%), and 93,365,260 lines contained user defined tags (that’s 7.7%). With these shocking numbers, you have to wonder, just how much genealogical data is lost when transferred?
What can you, as a genealogist, do to reduce the risk of information loss?
If you – after checking the quality of your GEDCOM file – find that there is a risk of information loss, contact the supplier of your family tree program or service. Ask them to improve GEDCOM support (and minimize the use of user-defined tags and document them), so that parts of your genealogical data are not lost during export (and import)!
In your contact with the vendor you can send the GED-inline report of the validation of your GEDCOM file and the link to www.gedcom.org where the GEDCOM specifications are published. If the supplier does not consider the quality of the GEDCOM export (your genealogical data!) as important, it may be time to look for another family tree program of service.