com.googlecode.whatswrong.io
Class GaleAlignmentFormat

java.lang.Object
  extended by com.googlecode.whatswrong.io.GaleAlignmentFormat
All Implemented Interfaces:
CorpusFormat

public class GaleAlignmentFormat
extends java.lang.Object
implements CorpusFormat

The GaleAlignmentFormat reads bilingual alignment data in a xml-like format. The source tag element contains the tokenized source sentence, the translation element contains the target tokenized sentence. The matrix element contains a matrix in which the first row and first column indicate which tokens are null-aligned, and the remainder of the matrix is simply the alignment matrix where each column corresponds to a source token, and each row corresponds to a target token. The seg element can contain the id of the sentence, but doesn't have to. It's only important that there is a seg element for each sentence.

 

<seg id=1>

<source>Ich habe den Fehler in meiner Sprachverarbeitung gefunden .</source>

<translation>I've found the error in my NLP .</translation>

<matrix>

0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 </matrix>

<seg id=2> ...

Author:
Sebastian Riedel

Nested Class Summary
 
Nested classes/interfaces inherited from interface com.googlecode.whatswrong.io.CorpusFormat
CorpusFormat.Monitor
 
Constructor Summary
GaleAlignmentFormat()
           
 
Method Summary
 javax.swing.JComponent getAccessory()
          Returns the GUI element that controls how this format is to be loaded.
 java.lang.String getLongName()
          Returns a longer name that may contain information about the configuration of this format.
 java.lang.String getName()
          Returns the name of this format.
 java.util.List<NLPInstance> load(java.io.File file, int from, int to)
          Loads a corpus from a file, starting at instance from and ending at instance to (exclusive).
 void loadProperties(java.util.Properties properties, java.lang.String prefix)
          Loads a configuration for this format from the given Properties object.
 void saveProperties(java.util.Properties properties, java.lang.String prefix)
          Saves the configuration of this format to a Properties object.
 void setMonitor(CorpusFormat.Monitor monitor)
          Sets the objects that monitors the progress of this format when loading a file.
 java.lang.String toString()
          Returns the name of this format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

GaleAlignmentFormat

public GaleAlignmentFormat()
Method Detail

getName

public java.lang.String getName()
Returns the name of this format.

Specified by:
getName in interface CorpusFormat
Returns:
the name of this format.

getLongName

public java.lang.String getLongName()
Returns a longer name that may contain information about the configuration of this format.

Specified by:
getLongName in interface CorpusFormat
Returns:
the long name of this format.

getAccessory

public javax.swing.JComponent getAccessory()
Returns the GUI element that controls how this format is to be loaded.

Specified by:
getAccessory in interface CorpusFormat
Returns:
the GUI element that controls how this format is to be loaded.

setMonitor

public void setMonitor(CorpusFormat.Monitor monitor)
Sets the objects that monitors the progress of this format when loading a file.

Specified by:
setMonitor in interface CorpusFormat
Parameters:
monitor - the monitor for this format.

loadProperties

public void loadProperties(java.util.Properties properties,
                           java.lang.String prefix)
Loads a configuration for this format from the given Properties object.

Specified by:
loadProperties in interface CorpusFormat
Parameters:
properties - the Properties object to load from.
prefix - the prefix that properties for this format have in the Properties object.

saveProperties

public void saveProperties(java.util.Properties properties,
                           java.lang.String prefix)
Saves the configuration of this format to a Properties object.

Specified by:
saveProperties in interface CorpusFormat
Parameters:
properties - the Properties object to store this configuration of this format to.
prefix - the prefix that the properties should have.

load

public java.util.List<NLPInstance> load(java.io.File file,
                                        int from,
                                        int to)
                                 throws java.io.IOException
Loads a corpus from a file, starting at instance from and ending at instance to (exclusive). This method is required to call CorpusFormat.Monitor.progressed(int) after each instance that was processed.

Specified by:
load in interface CorpusFormat
Parameters:
file - the file to load the corpus from.
from - the starting instance index.
to - the end instance index.
Returns:
a list of NLP instances loaded from the given file in the given interval.
Throws:
java.io.IOException - if I/O goes wrong.

toString

public java.lang.String toString()
Returns the name of this format.

Overrides:
toString in class java.lang.Object
Returns:
String the name of this format.


Copyright © 2009. All Rights Reserved.