panel-collected data (repeated source-target entries)
Posted: 28 Aug 2017 20:41
I am very new to network analysis, and I have not found an answer to the following within the Gephi documentation or forums. This is a network analysis question.
I am working with data for a network of 200+ organizations all communicating among themselves. The network is undirected.
Because the organizations cannot all be interviewed, a panel of 20 experts was asked which organizations communicate with which others, according to their knowledge. Not all experts knew all 200 organizations, but they all answered for all pairwise combinations of organizations they knew.
As a result, we have a dataset for which many of the source-target pairs are repeated. For example, (Organization ABC,Organization XYZ) may have been cited by 4 panelists, and is thus present four times in the dataset.
Also, because of the way the data was collected, 'mirror' pairs are also present. For example, another 3 panelists might have cited the (Organization XYZ,Organization ABC) pair, bringing that pair to 7 instances in the data, because communication as investigated here is undirected.
My options are
1) eliminate all repeats. In the example, keep only one instance of (Organization ABC,Organization XYZ), and delete the other 6. I can generate a count (=7) and store it into an additional variable.
2) eliminate all the repeats for a given source-target combination, but not its mirror. In the example, keep only one each of (Organization ABC,Organization XYZ) and (Organization XYZ,Organization ABC). I can generate a count for each of those (=4, 3)
3) keep the data exactly as is. Four instances of (Organization ABC,Organization XYZ), three instances of (Organization XYZ,Organization ABC).
Are there pros and cons to each approach, or is one of them correct, and the others not?
I am working with data for a network of 200+ organizations all communicating among themselves. The network is undirected.
Because the organizations cannot all be interviewed, a panel of 20 experts was asked which organizations communicate with which others, according to their knowledge. Not all experts knew all 200 organizations, but they all answered for all pairwise combinations of organizations they knew.
As a result, we have a dataset for which many of the source-target pairs are repeated. For example, (Organization ABC,Organization XYZ) may have been cited by 4 panelists, and is thus present four times in the dataset.
Also, because of the way the data was collected, 'mirror' pairs are also present. For example, another 3 panelists might have cited the (Organization XYZ,Organization ABC) pair, bringing that pair to 7 instances in the data, because communication as investigated here is undirected.
My options are
1) eliminate all repeats. In the example, keep only one instance of (Organization ABC,Organization XYZ), and delete the other 6. I can generate a count (=7) and store it into an additional variable.
2) eliminate all the repeats for a given source-target combination, but not its mirror. In the example, keep only one each of (Organization ABC,Organization XYZ) and (Organization XYZ,Organization ABC). I can generate a count for each of those (=4, 3)
3) keep the data exactly as is. Four instances of (Organization ABC,Organization XYZ), three instances of (Organization XYZ,Organization ABC).
Are there pros and cons to each approach, or is one of them correct, and the others not?