iSGTW
Link - What are the odds?
Link of the Week: What are the odds?
Image courtesy Paddy Power
With the Large Hadron Collider started up, Paddy Power — Ireland’s largest online bookmaker — has started offering odds as to what the world’s largest machine will discover, and in what order.
The bookie is offering 11-to-10 odds that dark matter (no longer a dark horse candidate) will be found before black holes. And 8-to-1 odds that black holes will be first.
Dark energy sits at 12-to-1 odds.
And the site says that there is a 100-to-1 chance that the LHC will find God. Not the “god particle” — aka the Higg’s Boson — but something bigger.
A spokesman for Paddy Power told the UK’s Daily Telegraph that “confirmation of God’s existence would have to be verified by scientists and given by an independent authority before any payouts were made, however.”
Tags: Europe Links/Statistics/Acronyms Physics
ISGTW Home Page: http://www.isgtw.org
Image - Beware of grids
Image courtesyTobias Blanke
Tobias Blanke of King’s College London, was looking for something to illustratre his presentation at ISGC 2010 when he ran across this image. Which leaves us at a loss for words!
Tags: Europe ImagesISGTW Home Page: http://www.isgtw.org
Feature - CLARIN: A project that speaks to you
Wee-Ta-Ra-Sha-Ro, Head Chief of the Wichita. Painted by George Catlin in 1834. Image courtesy Indigenouspeople.net
The creation story of the Wichita people tells of a creator, “Man-never-known-on-Earth,” who formed the world, land, water and the first man and woman: “Man-with-the-Power-to-Carry-Light” and “Bright-Shining-Woman.” This couple brought to the Earth light, corn-growing, deer-hunting, game-playing and prayer, before becoming the morning star and the moon.
While the story itself is preserved in literature for antiquity (e.g., in George Dorsey’s 1904 book The Mythology of the Wichita), fewer than 10 people today can tell the story in the Wichita language, nearly all of whom are elders living on tribal lands in Oklahoma, USA.
It’s a pattern repeated around the world; many languages are endangered or dying. Preserving these languages is vital for groups seeking to revitalize and maintain their culture.
Linguists have been recording and documenting endangered languages for as long as there has been recording equipment, or about 120 years. What has been lacking — until now — is a central place to search and access these data stores, which are scattered around the world. To remedy this, the CLARIN project is studying and preparing to provide comprehensive language research and preservation tools.
CLARIN, or Common Language Resources and Technology Infrastructure, began preparing its infrastructure in 2008. At the end of 2010, it expects to move into the construction phase. Its goal is seamless access to language archives and applications; by doing so, CLARIN hopes to become an invaluable tool for helping to document and understand our languages — and therefore understand ourselves.
The newest edition of UNESCO’s Atlas of the World’s Languages in Danger totes up 6,000 world languages — and counts 2,500 as endangered and 200 as lost. The interactive atlas ranks the 2,500 endangered languages by five levels of vitality: unsafe, definitely endangered, severely endangered, critically endangered and extinct. Image courtesy UNESCO
An advantage to all
Many sectors of society will benefit, say CLARIN’s creators.
For instance, an educator or government official reviewing educational policy could search stored archives of childrens’ recordings in her country. Using this information, she could then compare indicators of linguistic sophistication — breadth of vocabulary for example — among children of the same age from different regions in her country, or perhaps compare the language skills of boys and girls within the same age group.
Similarly, a historian researching a given politician could determine the frequency with which he used a certain word or phrase in a given month, year or decade. This kind of data could illuminate the germination of a political idea or movement.
Or a dictionary writer could clarify and expand a word’s meaning based upon the syntax and phrases commonly associated with that entry.
And a teacher seeking to expand his students’ horizons could show them language systems radically different from their own. One example of the latter is Kuuk Thaayorre, spoken by aboriginal people of Far North Queensland, Australia — a language which contains no word for left and right. Directions (north, south, east and west) do the job instead. Consequently, its speakers have a heightened spatial awareness, states linguistic researcher Lera Boroditsky of Stanford University, in an article in the website Edge:
“ . . . you have to say things like ‘There's an ant on your southeast leg’ or ‘Move the cup to the north-northwest a little bit.’ One obvious consequence of speaking such a language is that you have to stay oriented at all times, or else you cannot speak properly. The normal greeting in Kuuk Thaayorre is ‘Where are you going?’ and the answer should be something like ‘South- southeast, in the middle distance. . . ’ ”
Most likely you and I, in the absence of a compass, wouldn’t be able to get past “Hello.”
Unusual Challenges
To create such a repository means overcoming a variety of challenges. “The needs of our users — as well as the needs of our sources — present some interesting problems,” says Martin Wynne, a member of CLARIN. For example, patient confidentiality must be preserved, and intellectual property rights respected. Consequently, sign-on to the CLARIN infrastructure will offer differing levels of access, with data from medical patients or children restricted, and recorded songs might be offered by only for academics, and not to commercial musicians.
More unusually, some data must be removed once the source dies.
The reason?
Upon the death of a Pitjantjatjara-speaking Aborigine in central Australia (near Uluru, or “Ayers Rock”), for example, anything associated with that person — such as photographs or recordings — temporarily becomes taboo for prolonged mourning periods lasting months or even years. Even the person’s name is not spoken, instead the phrase “Kuminjay” is substituted, in what anthropologists term “avoidance language.”
As a result, “We’ll have an ethical obligation to cut access to recordings of that person,” says CLARIN’S Peter Wittenberg.
Like a jigsaw puzzle
Besides the ethical considerations, the team needs to make sure that sources drawn upon by the CLARIN catalogue are reliable and persistent. A PhD student using CLARIN as a source for his thesis needs to trust that cited resources remain in place.
Wynn, Wittenberg and Daan Broeder of CLARIN recently visited the CERN IT department to observe how the Worldwide LHC Computing Grid and Enabling Grids for E-sciencE had approached security, monitoring and the provision of highly-available services.
“We are at the stage of designing the architecture,” says Broeden. “It is like a jigsaw puzzle: some pieces are already defined and in place. We are now looking for the missing pieces. To the extent we can we’d like to find preformed puzzle pieces that would be a good fit to save us from making and cutting our own.”
—Danielle Venton, EGEE
From UNESCO’s Atlas of the World’s Languages in Danger:
It is impossible to estimate the total number of languages that have disappeared over human history. Linguists have calculated the numbers of extinct languages for certain regions, such as, for instance, Europe and Asia Minor (75 languages) or the United States (115 languages lost in the last five centuries, of some 280 spoken at the time of Columbus). Some examples of recently extinct languages are:
• Manx (Isle of Man) — 1974, with the death of Ned Maddrell
• Aasax (Tanzania) — 1976
• Ubyh (Turkey) — 1992, with the death of Tefvic Esenc
• Eyak (United States, Alaska) — 2008, with the death of Marie Smith Jones
ISGTW Home Page: http://www.isgtw.org
Feature - Frontier guides computing through the collision landscape
Just like you might have trouble navigating using this antique map, detector experiments can’t make sense of their data using an out-of-date map of their detector. Image courtesy Boston Public Library’s Norman B. Leventhal Map Center under Creative Commons license
The colossal particle detectors that monitor collisions at the Tevatron in Illinois and the Large Hadron Collider in Switzerland are unique beasts.
Scientists design most of the parts inside them to meet an individual set of specifications. But every once in a while, they find something the detectors can share.
Scientists at the CMS and ATLAS experiments at CERN are using a software system that Fermilab’s Computing Division originally designed for the CDF experiment at the Tevatron. The system, called Frontier, helps scientists distribute at lightning speed information needed to interpret collision data. The system is based upon the widely used Squid web cache technology.
“Since data is often shared between sites or pulled from a remote site, the speed of data return is critical,” said John DeStefano, an engineer at the RHIC and ATLAS Computing Facility at Brookhaven National Laboratory. “Not even the fastest database servers can bridge the physical gap between geographically disparate sites. People noticed how efficiently Frontier worked for CMS, and so far there has been a notable benefit for ATLAS as well.”
Frontier caught on thanks to the interconnectedness of the particle physics community, said Fermilab engineer Liz Sexton-Kennedy. Many scientists now working on experiments at the LHC also worked on experiments at the Tevatron.
Fermilab computer scientists Jim Kowalkowski and Marc Paterno came up with the original idea for Frontier. A group of computer scientists at Fermilab who had previously gained experience with a similar system designed for the DZero experiment worked to implement the ideas at CDF. Another group from Johns Hopkins University contributed by testing the system.
A diagram of the Frontier architecture within CMS; to enlarge, please click on the image. Image courtesy Dave Dykstra, Fermilab
Adjusting for a changing frontier
Particle detectors like CDF, CMS and ATLAS are large, complex machines whose many parts move in amounts imperceptible to the eye but are critical to a scientist making precise measurements of particle tracks.
This makes reading data from inside a particle detector a bit like driving in a dream landscape whose features frequently shift. To navigate such an unpredictable setting, drivers continually need to swap out their maps for new, updated ones. In order to properly read data that detectors collect about an event, physicists need to know the lay of the land inside the detector at the time of collision.
What’s more, hundreds of thousands of computers around the world all need to pair that updated information with collision data as they analyze it, said Dave Dykstra, a Fermilab engineer who now heads the Frontier project.
“All of them need to load the data all at once,” he said. “It’s a big challenge.”
Scientists do not monitor the conditions of the detectors during each individual collision. In the CDF detector, beams of protons and antiprotons cross paths about 1.7 million times each second, each pass representing an opportunity for collisions. Scientists plan to cross beams of even more protons 3.1 million times per second in the CMS and ATLAS detectors once the LHC is up to full power.
Rather than try to keep up, scientists take new readings at frequent, regular intervals. A Frontier server takes information about the changing landscape of the detector from a database and sends it to other servers around the world, which then cache the information and share it with other, nearby computers. Only the Frontier server needs to request updated maps from the database.
The Frontier system uses HTTP, the same language Web sites use to communicate with Web browsers, to send database requests out to servers. HTTP is nimble enough to deliver information over long distances in multiple short bursts, and designed to handle huge numbers of users. Without Frontier, experiments would communicate through database queries better suited to a smaller number of local users.
Thanks to a recent upgrade by Dykstra, the system now saves even more time and computing power by skipping the step of reloading information if the detector maps have not changed. Frontier has earned its popularity, but like the computers it keeps supplied with new data, it must keep adapting to keep up with the changing landscape.
—Kathryn Grim, Fermilab
Tags: Americas Feature MiddlewareISGTW Home Page: http://www.isgtw.org
Feature - From EGEE to EGI: Plain talk with Bob Jones and Steven Newhouse
At the Uppsala Gala Dinner, Bob Jones of EGEE handed over to Steven Newhouse of EGI his most prized possession — a crown made from all the name tags he collected from conferences in the past six years. Image courtesy GridTalk
After six years, on 1 May, EGEE will hand over responsibility for the world’s largest grid infrastructure to a new organization dedicated to its coordination and development (EGI.eu), and its newly elected director, Steven Newhouse.
During its lifetime, EGEE — Enabling Grids for E-SciencE — assembled a world-wide infrastructure of CPU cores, hosted by computing centers around the world. Each month, about 13 million jobs are executed on the EGEE Grid.
This massive multi-disciplinary production infrastructure was led until now by Bob Jones who initially, like Steven, held the position of technical director at EGEE, and quickly advanced to project director.
During the 5th and last EGEE User Forum in Uppsala, Sweden, Rüdiger Berlich of Karlsruhe Institute of Technology discussed with Bob and Steven topics ranging from the need for sustainability to the relationship between grids and clouds. Here are their comments, in Question-and-Answer format.
Rüdiger Berlich: How would you, in a few words, define today’s grid infrastructures?
Steven Newhouse: Grids are effectively a mechanism for bringing together computing resources located in different administrative domains for secure accounted for access.
Bob Jones: In terms of grid infrastructure deployment, it is at a global level which has reached production operation. However, there is still a lot of work to be done to make grids easier to use and cheaper to operate.
Berlich: Where does EGEE fit into this?
Jones: EGEE has done a lot of work to push forward production grid deployment and acted as a good showcase of what is possible with production grids.
Newhouse: EGEE has been working for 6 years — and for 3 years in the European Data Grid before this — on these secure, accounted-for access mechanisms for services needed to support high-throughput data analysis.
Berlich: Can you describe the organization's major achievements ?
Jones: Putting in place the largest collaborative production grid infrastructure in the world for e-science. We have demonstrated that such a production infrastructure can be used by a wide range of research disciplines. It has produced scientific results in these disciplines and allowed us to do things which would not have been possible without this infrastructure.
Thus, through EGEE, scientists were able to do more science and on a larger scale, and get results in a shorter time frame. EGEE has formed collaborations within Europe and allowed Europe to collaborate as a whole with other regions. This will last.
Berlich: Can you give us a few numbers regarding the total investment in grid infrastructures over the course of EDG's and EGEE(I-III)'s lifetime?
Jones: The European Commission has contributed about 70 million euros. The total budget was in the range of 150-200 million euros, depending on how the partners’ contributions are counted.
Berlich: Can you describe the infrastructure that has been created ?
Jones: EGEE is present in 50 countries around the world. 300 sites contribute to its infrastructure, comprising some 150,000 CPU cores. There are more than 15,000 users. The majority of these users will continue to have access to resources via National Grid Initiatives (NGIs) — these are organizations such as the D-Grid alliance, which manage aspects of national grid deployment — inside of EGI.
"50 countries, 300 sites, 150,000 CPU cores, 15,000 users" — Bob Jones, on EGEE
Berlich: What is the relationship between EGEE and the Worldwide LHC Computing Grid (WLCG)?
Jones: There are several layers. First of all, there is a continuous exchange of ideas, people and technology. While EGEE has been a multi-disciplinary infrastructure from day one, WLCG has specifically been created for the needs of the LHC experiments and collaborations. LCG makes use of several grid infrastructures, of which EGEE is the largest. Other infrastructures include OSG in the US and NDGF in the Nordic countries.
Berlich: Where do you see major differences between grids and clouds, and where do they overlap?
Jones: They share the same heart. Amazon-style clouds provide users with a simpler interface than EGEE. By the same token, their middleware is also simpler. Clouds have a far more understandable and obvious business model. The e-science grid world is all about collaboration — bringing together resources that partners had anyway. Clouds cannot satisfy all the needs of grid users today. In particular, aspects of collaboration and result-sharing in virtual organizations are not well covered by clouds today, but will probably come in the future. Many of the more complex data management aspects are not there either at this time.
Berlich: Will grids and clouds converge ?
Jones: Aspects of clouds are picked-up and must be implemented in grids. In particular, the interface must get simpler. Virtualization is already present in both and will become even more present in grids, as time goes by.
I strongly believe we will see links being developed between commercial cloud offerings and collaborative grids.
Steven Newhouse of EGI toasting the new era. Image courtesy GridTalk
Berlich: How will EGEE's mission be carried on after the end of EGEE ?
Newhouse: EGI.eu will continue the coordination of European resources for international collaborations.
Jones: We are happy that EGI.eu is there and see it as a culmination of a lot of what has been done in EGEE over the last few years. I am very pleased to see that there is a sustainable structure in place in Europe.
Berlich: What will happen to EGEE's technical developments, such as gLite and the physical infrastructure after the end of EGEE?
Newhouse: gLite — middleware for grid computing — will continue to be supported and developed by the gLite Open Collaboration which will participate in the EMI project. The physical infrastructure will continue being coordinated by EGI.eu on behalf of its NGI & EIRO stakeholders. EGI.eu will look to deploy software from any external software provider that delivers software of high quality that is needed by our user community.
Berlich: With respect to gLite, from the perspective of “academic software development,” what worked and what didn’t ?
Jones: We managed to assemble a software suite picking the best components available and developing our own to fill in the gaps. I also think we did a good job in the certification and testing part of gLite. I am, however, not so happy with the overall interfaces between the components and the ability to install only subsets of it. This could have been better layered.
Newhouse: It brought a community together that was able to collaboratively develop software to meet the requirements of a user community. The processes for assessing the success of the software (and to stop developing failed prototypes) and to focus on what works and to build on the work of other software developed outside the project was probably one of our least successful activities.
Berlich: Can you describe EGI.eu's role and where its mission differs from that of EGEE ?
Newhouse: EGI.eu will primarily focus on sustainable coordination of the resource providers in Europe to provide an integrated, secure, reliable infrastructure to meet the needs of its user community.
Berlich: Where do the NGIs fit in?
Newhouse: NGIs are at the heart of EGI.eu as its stakeholders and as the major providers of the resources within their own national infrastructures. Frequently, the same NGIs will also have strong relationships with the local national elements of the European wide user communities that EGI will support.
Berlich: Who do you see as the big users of the infrastructure in the future?
Newhouse: Having access to an integrated European infrastructure provides greatest returns to research communities that have European wide collaborations — and beyond. Within Europe the ESFRI projects — i.e., other large international research collaborations. (ESFRI is the European Strategy Forum on Research Infrastructures, a strategic instrument to develop the scientific integration of Europe and strengthen its international outreach.)
“Virtualization has had a radical impact.” — Steven Newhouse
Berlich: What is EGI's policy on industry involvement?
Newhouse: Very open to collaborations where it provides benefits to our user community. I see several areas relating to the infrastructure itself: as providers of cloud resources, providers of software for deployment on the infrastructure for applications, and for management of the infrastructure through standard operational tools. We are also able to support commercial organizations using our resources for pre-competitive research work.
Berlich: What role does virtualization play and what interfaces will exist between grids and clouds?
Newhouse: Virtualization has had a radical impact on the way data centers in the commercial space deliver computing resources to support their transactional workloads to provide more efficient data centers, both in terms of human resources and in their energy footprint. Similar challenges now confront many of the data centers within EGEE and now EGI as they attempt to support the different service environments required by the increasingly diverse application communities using the production infrastructure.
Introducing a virtualization layer across the European Research Infrastructure could move the software deployment decisions away from the sites and back into the virtual organizations using the infrastructure.
Providing a secure, authorized and accounted-for mechanism across Europe for starting virtual machines on remote sites is in many ways no different from the currently agreed-upon procedures for starting jobs on remote sites. Virtual organization managers, or operations staff acting on their behalf, would prepare, deploy and monitor the software required by that domain.
The Forum’s main auditorium was an appropriate place to talk about architecture. Image courtesy GridTalk
Through the virtualization layer, different virtual organizations would be able to deploy the software needed by their community at an update cycle appropriate to their own work. The workload produced within the ERA is primarily based around data. The high performance research networks around Europe enable the rapid predictable movement of data between sites — many of which have the ability to store many petabytes of data. This capability and associated cost is distinct from that offered by many commercial cloud providers, mainly because the business model within the research networks makes this usage free to the end-user.
However, for activity that is more computing- rather than data- focused, such an architecture provides a bridge to commercial cloud providers as additional VMs can be deployed into a commercial cloud and their services integrated into the broader infrastructure available for a VO.
Berlich: How do you rate industry interest in grid technologies, from EGEE's perspective?
Jones: A number of industrial applications have been developed for the EGEE environment. However, from the outset, industry doesn’t seem to have been very interested in shared infrastructures. Our technology has been deployed in-house by companies, though. That is why we adopted an Open Source approach with a business-friendly Open Source license. We thus believe that the process of technology transfer from research to industry has succeeded.
Berlich: Bob, what will you do after the end of EGEE ?
Jones: I will remain at CERN and will still be involved in grids and e-Infrastructures, as well as their use by the research communities in Europe.
—Rüdiger Berlich for iSGTW. Berlich is responsible for the dissemination and outreach activities of the Swiss/German EGEE federation.
Tags: EGEE Europe European Grid Initiative Feature People Project ProfileISGTW Home Page: http://www.isgtw.org

