Don’t Want to Lose (Parts of) Your Genealogical Data?
The following is an article written by guest author Bob Coret
and is copyright by him. The article is published here with the
permission of Bob Coret:
Don’t want to lose (parts of) your genealogical data?
A
recent research report by Genealogy Online shows that genealogists have
a high risk of losing (parts of) their genealogical data when
transferring a GEDCOM file from their family tree program or service to
another family tree program or service. This is caused by the fact that
most family tree programs and services do not follow the GEDCOM
specification to the letter and because a lot of undocumented
“user-defined tags” are used.
Recently, Nigel Munro Parker, made his GEDCOM validator GED-inline [http://ged-inline.elasticbeanstalk.com/validate]
available for re-use. GED-inline reads a GEDCOM file and checks if the
file follows the rules of the specified GEDCOM specification. You get a
report nearly instantly (and free). Besides statistics it shows the
number of warnings and user-defined tags, as well as a list of all
warnings. Genealogy Online (a service for easily publishing your family
tree online) recently deployed the open-sourced GED-inline in its
infrastructure. Genealogy Online [https://www.genealogieonline.nl/en/]
now checks all GEDCOM files it receives to publish online. When there
are warning in regards to the GEDCOM file, Genealogy Online notifies the
user.
In order
not to lose genealogical information when it is transferred from “A” to
“B”, agreements on how the information is recorded are of great
importance. If both “A” and “B” adhere to these agreements, then the
information will come across properly – without loss of information!
Agreements about the format of genealogical information are laid down in
the GEDCOM specification. The most recent GEDCOM version is 5.5.5,
which is published on http://www.gedcom.org [https://www.gedcom.org/].
As a genealogist you do not have to dive
into these GEDCOM specifications. The specifications are intended for
the suppliers of family tree programs and services (more specifically,
their developers). But as a genealogist you should make sure that the
GEDCOM function of your family tree program or service adheres to the
GEDCOM specifications! After all, if a family tree program or service
does not adhere to the GEDCOM specifications, then there is a risk of
information loss during the transport of the genealogical information!
As a genealogist you can check the quality of your GEDCOM too! If you’re not using Genealogy Online, just go to GED-inline [http://ged-inline.elasticbeanstalk.com/validate]
directly and upload your GEDCOM. See how many warnings are in the
validation report. The number of warnings says nothing about your
genealogical information, you didn’t do anything wrong. The warnings
relate to compliance of the GEDCOM file with the GEDCOM specification.
If there are warnings, there is a good chance that the GEDCOM file will
not be fully understood by another family tree program or service and
that there is a risk of information loss!
Another number that you should pay
attention to in the GED-inline report is the User-defined value. This
number represents the number of lines in the GEDCOM file where a
so-called user-defined tag is used. Such tags are valid within GEDCOM,
but the meaning of this is not laid down in the GEDCOM specification.
And often, these use-defined tags are not documented publicly. So if
program “A” places a certain information in a user-defined tag, chances
are that program “B” does not know what information it is and what it
should do with it. In a best case scenario these values are included as a
comment, in the worst case scenario, these values are ignored. So, the
user-defined tags also increase the risk of information loss.
Genealogy Online’s ‘GED-inline validation statistics’ [https://www.genealogieonline.nl/en/GED-inline/]
report show that 1,215,130,449 lines of GEDCOM were inspected,
8,129,466 warnings were given (that’s 0.7%), and 93,365,260 lines
contained user defined tags (that’s 7.7%). With these shocking numbers,
you have to wonder, just how much genealogical data is lost when
transferred?
What can you, as a genealogist, do to reduce the risk of information loss?
If you – after checking the quality of
your GEDCOM file – find that there is a risk of information loss,
contact the supplier of your family tree program or service. Ask them to
improve GEDCOM support (and minimize the use of user-defined tags and
document them), so that parts of your genealogical data are not lost
during export (and import)!
In your contact with the vendor you can send the GED-inline report of the validation of your GEDCOM file and the link to www.gedcom.org
where the GEDCOM specifications are published. If the supplier does not
consider the quality of the GEDCOM export (your genealogical data!) as
important, it may be time to look for another family tree program of
service.
8 Comments
I don’t agree.
I’m in the same German GEDCOM-L group as Dirk. We have searched a way to export
the german “Rufname”. It is no Nickname and no way to do it in any GEDCOM version. So we agreed to _RUFNAME as a new tag and it works fine for all represented developers of the GEDCOM-L group.
Or locations that stored in a place management. We have agreed to this (a complete new record):
0 @@ _LOC
1 NAME {1:M}
2 DATE {0:1}
2 _NAMC {0:1}
2 ABBR {0:M}
3 TYPE {0:1}
2 LANG {0:1}
2 <> {0:M}
1 TYPE {0:M}
2 DATE {0:1}
2 <> {0:M}
1 _FPOST {0:M}
2 DATE {0:1}
1 _POST {0:M}
2 DATE {0:1}
2 <> {0:M}
1 _GOV {0:1}
1 _FSTAE {0:1}
1 _FCTRY {0:1}
1 MAP {0:1}
2 LATI {1:1}
2 LONG {1:1}
1 _MAIDENHEAD {0:1}
1 EVEN [|] {0:M}
2 <> {0:1}
1 _LOC @@ 0:M
2 TYPE {1:1}
2 DATE {0:1}
2 <> {0:M}
1 _DMGD {0:M}
2 DATE {0:1}
2 <> {0:M}
2 TYPE 1:1
1 _AIDN {0:M}
2 DATE {0:1}
2 <> {0:M}
2 TYPE {1:1}
1 <> {0:M}
1 <> {0:M}
1 <> {0:M}
1 <> {0:1}
How can you manage this only with tags from any GEDCOM version.
Greetings from Germany, Stefan.
()
I’m in the same German GEDCOM-L group as Dirk. We have searched a way to export
the german “Rufname”. It is no Nickname and no way to do it in any GEDCOM version. So we agreed to _RUFNAME as a new tag and it works fine for all represented developers of the GEDCOM-L group.
Or locations that stored in a place management. We have agreed to this (a complete new record):
0 @@ _LOC
1 NAME {1:M}
2 DATE {0:1}
2 _NAMC {0:1}
2 ABBR {0:M}
3 TYPE {0:1}
2 LANG {0:1}
2 <> {0:M}
1 TYPE {0:M}
2 DATE {0:1}
2 <> {0:M}
1 _FPOST {0:M}
2 DATE {0:1}
1 _POST {0:M}
2 DATE {0:1}
2 <> {0:M}
1 _GOV {0:1}
1 _FSTAE {0:1}
1 _FCTRY {0:1}
1 MAP {0:1}
2 LATI {1:1}
2 LONG {1:1}
1 _MAIDENHEAD {0:1}
1 EVEN [|] {0:M}
2 <> {0:1}
1 _LOC @@ 0:M
2 TYPE {1:1}
2 DATE {0:1}
2 <> {0:M}
1 _DMGD {0:M}
2 DATE {0:1}
2 <> {0:M}
2 TYPE 1:1
1 _AIDN {0:M}
2 DATE {0:1}
2 <> {0:M}
2 TYPE {1:1}
1 <> {0:M}
1 <> {0:M}
1 <> {0:M}
1 <> {0:1}
How can you manage this only with tags from any GEDCOM version.
Greetings from Germany, Stefan.
()
Like
The webside destroy my posts
Stefan
Like
You can represent any type of name, not just nickname, with the NAME.TYPE structure that is mandatory, anyway. The PERSONAL_NAME_PIECES with NAME_PIECE_NICKNAME is optional. So, for example, you could have:
n NAME
+1 TYPE RUFNAME
You can have as many name structures attached to an INDI record as you want.
Informations from FamilySearch (by asking about 5.5.5):
“The Church of Jesus Christ has the copyright on the Gedcom Specification since 1987. There has not been a legal transfer of the rights we have to the Gedcom Specification.”
So 5.5.5 is not a legal GEDCOM version.