The Importance of Open Data for Data Journalism.

The use of the Open Data allows public administrations, non-governmental organizations and companies to offer information available to other entities for reuse. The opening of the data enables the creation of new products and services with added value.

It is undeniable that the irruption of the Internet has caused that a large volume of information generated daily can be available on the network. Analyzing and managing this large amount of data to display it in an understandable way to the user is the work of companies known as infomediary sector .

The Infomediary Sector in Spain is made up of companies that generate services for marketing to third parties based on public information.

Included within this sector are companies created explicitly for the purpose of marketing public information, or those that have a specific area or department dedicated to the creation of new products based on public data. Data Journalism is one of the areas that make up this sector.

According to a study of ONTSI ( National Observatory of Telecommunications and the Information Society ), the Infomediary Sector in Spain was formed in 2014 by more than 400 companies that generated services for marketing from Public Data Open .

Before reflecting on the importance of Open Data for Data Journalism we will analyze the report made by the Ministry of Industry to assess the importance of the Infomediary Sector in the country’s economy.

How important are Open Data in the Infomediary Sector?

The latest report of the ONTSI estimates that the re-use of public information generates an annual turnover of around 500 Million Euros .

If we also take into account information from private sources, the figure rises to 1.2 billion Euros. This represents a 6% growth in the sector’s overall activity, in relation to the previous edition.

The report estimates that there are 4,500 jobs linked to the infomediary activity of reusing public information. The number of companies oriented to the infomediary business amounts to 413. Of these 364 operate with Open Data from the public information of the institutions and administrations of the country.

The Open Data of the Public Sector allows to generate greater transparency when becoming an excellent means to communicate the public management carried out and for the accountability. This allows the Administration to be more transparent in the management of the public service and generate confidence in the population.

The opening of data links the citizen with the public entities, which favors the communication and allows to improve the public services.

Open Data enables the creation of a community of informed users, who can exchange ideas and opinions, face challenges and find solutions. In short, the opening of public information improves the processes of citizen participation.

Another aspect reflected in the report is that companies in the sector agree on the future of Smart Cities and the availability of Real Time Data ( RTOD ) as the great opportunities for growth and consolidation.


Assange denies receiving data from Russia or China

The founder of the Wikileaks leak platform, Julian Assange, denied that his organization received information from Russia or some other government after the CIA accused him of being a Kremlin-incited “intelligence service.”

“Our source is not a state of any kind: neither Russia nor China. We have said that repeatedly and publicly from the beginning,” asserted Assange in an interview on Monday evening (17.04.2017) by Mexican Carmen Aristegui on CNN .

Likewise, Assange ruled that hacking by the Russian government during the US elections served as a source of information for its platform. “WikiLeaks has published more than 600,000 documents about Russia and Syria, and WikiLeaks has also published aggressively about Russia and its allies,” he added.

Last week, the head of the Central Intelligence Agency (CIA), Mike Pompeo, labeled Wikileaks as a “hostile intelligence service” with the United States, “often incited by state actors like Russia.”

The remarks came after Assange’s platform released a series of computer codes used by the CIA for its computer hacking.

During the interview, Assange referred to the possible relationship between Russia and the election campaign of the current US president, Donald Trump, and said they have no evidence or opinion about it. “If we get it, of course we will publish it,” he said.

Linked open data

Linked open data is a concept that systematically combines two different modes of data management presents on Internet :

  1. The linked data (according to the semantic web principles enacted by Tim Berners-Lee);
  2. The open data.

The data are related to each other through combinations “data-link-given” or “subject-predicate-object” (The predicate is what attributes something about, and refers to a relationship on a predefined set of individuals) this combination forms in “an RDF graph” what is called a ”  Triplet RDF  ” in the jargon of data managers (here meaning RDF resource Description Framework , the graph model used to formally describe web resources and metadata, In order to allow automatic processing of these descriptions and a certain interoperability ).

Linked open data is closely linked to the development of the semantic web. The latter is the basis of an idea by the inventor of the web, Tim Berners-Lee, who published an article in 2001 presenting the concept of semantic web. The idea presented is difficult to penetrate and to materialize, probably because of the underlying technical complexity. In 2006, Berners-Lee published a second article, entitled simply Linked Data. The concept then began to become popular and gained visibility.



  • DBpedia is one of the best known and large examples, which adopted the standards of the network linked open data  and those of the semantic Web , making it quickly interconnected to other web repositories such as GeoNames , MusicBrainz , CIA World Factbook , the Gutenberg project and Eurostat  . Access to data repositories is done with requests to the database via SPARQL . Because information is stored with the Resource Description Framework , you can also retrieve resource documents related to a concept directly via a URI.
  • Clean energy info portal  : For Denise Recheis (Austrian expert in knowledge management ), the Clean Energy info portal ( and the Energy Info wiki ( are designed as gateways to a ” Mine of information “ on the problems of renewable energies , energy efficiency and climate change . They are hosted respectively by REEEP (Renewable Energy and Energy Efficiency Partnership) and NREL (National Renewable Energy Laboratory), two organizations very committed to the idea of ​​open and linked data and which have incorporated the essential principles ”  .

The technologies used in open data

The open data is subjected to a single constraint which is to be accessible by all types of machines to allow their processing. This implies that open data is interoperable . If the data does not comply with the standards of the Web to allow their interoperability, we will speak of data clamped because their reuse is less or almost nil.

This interoperability test was never really respected in the industry, until 2008 when the standard SPARQL became a recommendation of the W3C . This query language allows developers to test their applications directly from their web browsers on online open data and to develop their own program to analyze the data. It is thus possible to consume the data remotely without having to transform them or to move them. For example, governments in the UK and the US have begun to switch their data in the Web data open (in English, Linked Open Data or LOD) respecting standards of W3C and providing an access point SPARQL For developers.

Example of actually open data: find on a map schools closest to home with data from (from the UK government) that provides all school-related data on its territory.

A consequence of this political choice, to offer a true standard for public data such as Web data is described in the book “Linking Government Data”. This book describes how the Web data rose from some 40 million triples RDF within four data warehouses 2007-203 warehouses with more than 25 billion triples with 395 million connections at the end of 2010. More optimists speak of exponential data growth and announce Web 3.0 and even a potential point of singularity in the future. Nevertheless, this political choice opened up new avenues of scientific, economic and social research.