Thesis projects Phd in Computer Science

Ir a contenido Ir a Estudios, Gobernanza y organización
Logo UA
Realizar búsqueda
Menú
Other sites
Logo Phd in Computer Science   Phd in Computer Science
Phd in Computer Science

Thesis projects

In this section, we will list some possible thesis projects proposed by researchers of the doctoral programme. Those doctoral
candidates interested can contact the researcher involved.

  • Supervisor: Felipe Sánchez-Martínez (fsanchez@dlsi.ua.es)
    Line of research: Machine translation, digital libraries and computer-aided education
    Title of the project: Hybrid approaches for hierarchical phrase-based statistical machine translation
    Description: The leading paradigm in machine translation today is statistical machine translation. There are two types of statistical machine translation systems not using any explicit linguistic information: traditional phrase-based systems that use use flat segment-level translation units for translation, and hierarchical phrase-based systems that use probabilistic synchronous context-free grammars, i.e. they use tree-based translation units. Both types of statistical machine translation systems can be built with little or no human effort when large parallel corpora are available. However, the amount of parallel corpora required to achieve state-of-the-art results is not always available. As a result, rule-based machine translation systems are still being actively developed, specially for under-resourced languages, since they do not rely on existing parallel corpora, but on explicit representations on linguistic information such as bilingual lexicons and structural transfer (grammatical transformation) rules. Hybrid approaches taking advantage of both machine translation paradigms have been developed during the last years to improve translation quality, with special focus on the hybridisation of rule-based systems and phrase-based statistical machine translation systems. The supervisor of this project has recently supervised a Ph.D. thesis on the automatic inference of structural transfer rules for rule-based machine translation and their integration into  phrase-base statistical machine translation. The aim of this thesis is to explore the inference of structural transfer rules suitable for their integration into hierarchical phrased-based statistical machine translation systems and to explore new hybrid approaches for the integration of explicit linguistic resources ---bilingual lexicons and structural transfer rules--- into hierarchical phrase-based statistical machine translation.
  • Supervisor: Mikel L. Forcada and Felipe Sánchez-Martínez email: mlf@ua.es, fsanchez@dlsi.ua.es
    Line of research: Machine translation, digital libraries and computer-aided education
    Title of the project: Technology-independent automatic postediting of machine translation
    Description: Machine translation results usually need to be postedited (revised by translators) to make them suitable to a specific communication purpose. When the texts belong to a specific domain, is possible to automate partially the process of postediting (and therefore, reduce the effort) by means of what is called 'automatic postediting'. Automatic postediting can learn from the work of professional posteditors. Up to now, this learning has been done using relatively complex statistical machine translation systems, that have to trained with lots of postediting data.. The aim of this thesis is to explore the possibility to learn, using the existing machine translation system, and independently of how it operates, a light automatic postediting method that can be implemented easily in computer-aided translation environment, and which is capable of learning from the work of the translation professionals.
  • Supervisor: Mikel L. Forcada and Felipe Sánchez-Martínez email: mlf@ua.es, fsanchez@dlsi.ua.es
    Line of research: Machine translation, digital libraries and computer-aided education
    Title of the project: Standardisation and strategies of the process of harvesting parallel corpora from the Internet
    Description: Recently, the use of methods that collect parallel corpora from the Internet has generalised. These corpora can be used as translation memories in computer-aided (professional) translation or to train statistical machine translation system (the most known example being Google's translator). There are several free tools, such as Bitextor or ILSP Focussed Crawler, which, as a result of different strategies regarding (a) how to organise the exploration and the following of link; (b) how to decide whether two harvested documents are mutual translation, and (c) how to segment and align these segments, produce entirely different results. It is therefore needed a theoretical formalization of these three processes (exploration, coupling and alignments) allowing, on the one hand, to describe in abstract terms (and with capacity of predicting behaviour) existing implementations, and on the other hand, to define reference implementations of successful strategies.
  • Supervisor: Mikel L. Forcada and Felipe Sánchez-Martínez email: mlf@ua.es, fsanchez@dlsi.ua.es
    Line of research: Machine translation, digital libraries and computer-aided education
    Title of the project: Using postediting effort to guide the tuning and selection of sources of translation assistance
    Description: On the one hand, the evaluation and the tuning of sources of translation assistance such as machine translation is currently done with automatic evaluation measures that cannot ensure the reduction of postediting effort. On the other hand, current methods to estimate the quality of the output of these sources cannot easily be used across sources and therefore cannot be used to select the best one for each segment. The supervisors have recently defined a general framework for the measurement and estimation of postediting effort so that sources of translation assistance can be both optimized and combined in a principled manner, so that professional translators will benefit from the seamless integration of all the technologies at their disposal when working on a translation job. The aim of this thesis is to integrate existing results in automatic evaluation, quality estimation, and assistance source selection into this framework and to produce and test complete reference implementations in real-world computer-aided translation scenarios.
  • Supervisor: Mikel L. Forcada email: mlf@ua.es
    Line of research: Machine translation, digital libraries and computer-aided education
    Title of the project: Effect of manual word-alignment guidelines in phrase pair extraction using discriminatively-learned aligners.
    Description: Current statistical machine translation techniques are based on the concept of phrase (multi-word segment) pairs. These phrase pairs are extracted from sentence-aligned parallel corpora using automatic word aligners. One way in which word aligners can be learned is through discriminative, supervised learning from manually word-aligned sentence-pair corpora. Manual word alignment is usually done by following explicit guidelines that dictate when two words should be aligned, but there may be more than one way to formulate these guidelines, as there is no single agreed way to represent translation equivalence through word alignment. This thesis explores the effect of guidelines used on the quality of extracted phrase pairs and the resulting machine translation.
  • Supervisor: Juan Ramón Rico email: JuanRamonRico@ua.es
    Research line: Pattern recognition, machine learning
    Project title:  Study of quality empirical measures of online learning systems..
    Description: There are several learning systems based on incremental or adaptive models (which require little computational cost  to update predictive models) working with an acceptable rate of correct answers when they have a sufficient number of examples. A new approach would be the study the evolution of the quality of prototypes in which the model is based for the purpose of:
    - quantifying how valid the predictions per class at each instant are;
    - avoiding learning inadequate prototypes (noise);
    - correcting or adjusting the model final decisions.
  • Supervisor: Sergio Luján email: sergio.lujan@ua.es
    Research line: Machine translation, digital libraries and computer-aided education
    Project title: Development of an evaluation system based on open educational resources to help locate high quality resources.
    Description: Open Educational Resources (OER) are materials published with open access licenses usually allowing reuse. At present there is a large volume of OER published on the website, but there is no mechanism to control their quality.
    This paper aims to develop a system that allows a simple way to assess OER by their users. Initially, the feasibility of different solutions (comment analysis, distributed open collaboration or or crowdsourcing, download and use statistics, etc.) should be studied to adopt the most appropriate technique. Next, a model and prototype to demonstrate its operation should be developed..
  • Supervisor: Sergio Luján email: sergio.lujan@ua.es
    Research line: Machine translation, digital libraries and computer-aided education
    Project title: Semantic Web application to enhance education information retrieval
    Description: A microformat is a simple way to add semantics to a content readable by a human while it can be just plain text for a computer system. Microformats are part of  semantic Web, based on Web 3.0. Microformats have been applied to several domains, like cooking recipes, film and book reviews, and to companies and organisations. However, they have not been applied to the educational domain yet.This paper aims to study the applicability of currently existing microformats to the educational domain (for example, to add semantic information to the advertisement of courses or publications on educational resources) and the development of new microformats to fill the gaps of current microformats.
  • Supervisor: Sergio Luján email: sergio.lujan@ua.es
    Research line: Software Engineering, Web Engineering and Business Intelligence
    Project Title: Development of heuristic methods for evaluating web accessibility
    Description: Web accessibility aims to make web pages usable by as many people, regardless of their understanding or personal capabilities and regardless of the technical features  of the equipment used for Web access. Currently, there are different guides to help create accessible web pages. Unfortunately, these guidelines do not include all existing accessibility problems. Furthermore, automatic verification is not very precise. This is why heuristic methods have been developed to complement guidelines on accessibility. This thesis intends to research current heuristic methods for the evaluation of web accessibility in order to develop a new set of heuristics that complement those existing and take into account the latest technological advances.
  • Supervisor: Sergio Luján email: sergio.lujan@ua.es
    Research line: Software Engineering, Web Engineering and Business Intelligence
    Project Title: Analysis of implementation level of Web accessibility at present
    Description:  Web accessibility aims to make web pages usable by as many people, regardless of their understanding or personal capabilities and regardless of the technical features  of the equipment used for Web access. In most first world countries, there are laws requiring that websites of public authorities (government ministries, city councils, universities, etc.) are accessible. However, their compliance is very uneven. This thesis aims to analyse the level of implementation of web accessibility guidelines among those organisations that must comply it by law. The aim of this work is to develop automated analysis methods for the websites of different countries to obtain a final evaluation report with the current state of web accessibility.
  • Supervisor: Mikel L. Forcada and Felipe Sánchez-Martínez email: mlf@ua.es, fsanchez@dlsi.ua.es
    Research line: Machine translation, digital libraries and computer-aided education
    Project title: Standardisation and strategies of the process of harvesting parallel corpora from the Internet
    Description: Recently, the use of methods that collect parallel corpora from the Internet has generalised. These corpora can be used as translation memories in computer-aided (professional) translation or to train statistical machine translation system (the most known example being Google's translator). There are several free tools, such as Bitextor or ILSP Focussed Crawler, which, as a result of different strategies regarding (a) how to organise the exploration and the following of link; (b) how to decide whether two harvested documents are mutual translation, and (c) how to segment and align these segments, produce entirely different results. It is therefore needed a theoretical formalization of these three processes (exploration, coupling and alignments) allowing, on the one hand, to describe in abstract terms (and with capacity of predicting behaviour) existing implementations, and on the other hand, to define reference implementations of successful strategies.