MAKING DC-CAM DOCUMENTS
Enhancement, Preservation, and Security of Archives : Web portal to approximately 100,000 pages of currently non-public records from Documentation Center of Cambodia (DC-Cam)’s archive in support of the Khmer Rouge Tribunal in Khmer and English. This activity will support the development of a web portal that will provide access to approximately 100,000 pages of records contained within DC-Cam’s archive. Development of this web portal will involve a number of highly technical, time-intensive supporting activities, ranging from time-intensive tasks such as transcription, translation, and scanning of records, to highly technical activities such as designing the user interface and optical character recognition (OCR) applications in Khmer and English. All of these activities will be managed by an information technology consultancy, which will work with DC-Cam in providing technical expertise in developing the tools and capabilities to make its archive accessible to the Cambodian people and the world.
DC-Cam’s archives contain well over one million pages of records. Well over 50 percent of these records have not been scanned and they are only available by manual retrieval of the physical document. DC-Cam has placed a few dozen of its most significant records online for retrieval and download; however, the vast majority of its archive remains accessible only by physical retrieval at the Center. The goal of this activity is to build a web portal to these records by which a member of the public could retrieve the records from their own personal computer or laptop. The general supporting tasks for this activity are set forth below.
First, many of DC-Cam’s records are hand-written and in poor physical condition, which precludes the possibility of quick scanning. In their current condition, such documents would not be optical character recognition (OCR) capable even if they were scanned. In order to address this circumstance, DC-Cam would need to manually transcribe these difficult records into a digital form that could then be catalogued and retrieved from a digital archive. Transcription of the record would allow some of the more difficult records’ information to be retrievable in a digital form.
Retrieval does not correspond to immediate OCR retrieval capability. Khmer is written continuously to the end of a line, and a new line starts whenever the horizontal space runs out. This writing characteristic is different from English, which has white spaces between words.
Therefore the techniques for rendering Khmer records OCR compatible will need to be different than the methods for English. Working with a consultant, DC-Cam will work to make most, if not all, of the records offered in this web-accessible archive OCR capable.
Second, many of DC-Cam’s records have never been translated from Khmer to English or other languages including Chinese. In order to enhance the foreign community’s access to the information in these records, DC-Cam would need to dedicate significant time to translation of the records. Using a team of translators, DC-Cam will work to make most, if not all, of the records offered in this web-accessible archive available in English.
Third, the development of this web portal will require development of the website, user and administrative interfaces, as well as hosting the archive portal. DC-Cam’s information technology (IT) team will work alongside the consultant in the design of the website, including its navigational tools, which will be available in English and Khmer, based on a user’s language preference.
Fourth, the project will conduct a limited user needs assessment, with pilot testing, stakeholder input, and a feedback system to ensure users and stakeholders can provide input on deficiencies, bugs, and ways to improve the website.
Finally, the project will also employ Geographic Information Systems (GIS) technologies and expertise to update all major maps it currently has on-file such as the maps on the Killing Fields, prisons, memorials, and outreach locations. Nearly all of DC-Cam’s maps have not been updated since the early 2000s. Since this time, new killing sites, prisons, memorials, and other data have become available. Using GIS technologies, DC-Cam wants to update all of its major maps, incorporating more data on various levels to add richer context to inform the reader.
Transcription of records not easily scanned
1. Render records OCR capable
2. Translation of records from Khmer to English
3. Development of the website and user/ administrative interfaces
4. User needs assessment, pilot testing, stakeholder input & feedback system
5. GIS map updates