Preparing Textual Data for the General
Inquirer
The general-purpose Java version of the
General Inquirer processes all the text within each of
the files contained in a specific folder. An output record of tag
counts is made for each file, which can then be a row in a
statistical spreadsheet.
- Files should be edited to have any content removed that
should not be part of the analysis.
- Information about each file should be in the file name.
- The file names within each folder should have the same
format.
The General Inquirer creates columns for this file name
identification information in the output spreadsheet according to the
following procedure:
1) All the characters in a file name starting with the
first period are removed.
For example, ".txt" and ".doc" will be removed.
2) Each word in a file name (separated by spaces) is given a
separate ID field.
3) The ID fields are labeled ID1, ID2, etc.
It may be helpful to rename these columns with more
descriptive labels later on the statistical spreadsheet.
4) For the last word in the file name (which may be the only
word if there is but one):
The computer tests to see if it begins with a
character.
If it does, it then looks for a digit in the word.
If a digit is found, then all characters up to
the digit are made into one ID field and the characters
starting with the digit are made a second ID field.
Some examples:
bush speech defense1
is made into 4 ID fields for the candidate name, the
type of document, the topic, and the serial number within that
group: (1) bush, (2) speech, (3) defense, (4) 1
UMIN 0225.txt
will have the ".txt" removed and be made into two
fields, one for Univ. of Minnesota, the second for the
newspaper date (February 25). The date field may be further
recoded into groupings by the statistical software.
DH134.TXT
will have the ".TXT" removed and separated into two
fields, "DH" for a high performer and "134" for the
respondent's ID number.
C87
will similarly be two fields, with the "C" for
conservative party and "87" indicating the year of the party
manifesto.
If more identification information is needed than can be fit into
a file name (because of the restrictions on file name length in your
system) then the file name should contain a unique ID that can be
linked to a row in a spreadsheet, for later merging with the
Inquirer's output spreadsheet.
Special versions of the Inquirer also are
operational:
1) For open-ended responses contained in a
column of Excel spreadsheet cells:
The Excel spreadsheet is saved
as a tab-delimited file. The Inquirer processes each cell in the
specified column as a unit and produces output records of tag
counts for each cell. A single cell can contain up to 32,000
characters of text or more than 3000 words.
2) For real-time feedback:
Example: The computer presents
a "TAT" picture and asks the respondent to tell a story about the
picture that has a beginning, middle and end. The computer then
gives instant feedback to the respondent about the story. This
feedback can given right over the internet.
Return to Home Page
|