A Review of Semantic Annotation in the Context of the Linked Open Data Cloud

ABSTRACT


INTRODUCTION
Semantic annotation is one of the modern technologies that is beneficial to people because it helps them understand various things in any text [1].Semantic annotation is also considered the process of adding additional information or metadata to the text to enhance its meaning.The huge input data is, of course, ambiguous, so anyone requires to understand raw data, which is quite difficult, rather than formal text [2].It is also significant to make the text have the ability to be machine-and human-readable so that the operation of extracting and innovating on several subjects will stimulate researchers to do a huge number of studies to develop the process of text annotation.This paper is a review of what is done in the literature on semantic annotation, consisting of the old research, taking into account the tools, methods, and techniques about semantic annotation and their relationship with the LOD.
Semantic annotation is considered a very significant contemporary technology that is used in text mining to give the selected text clarity and understanding rather than an ambiguous form [3]. Semantic annotation can be defined as a comment that is added to the text, image, or diagram to enrich the target information, which can be formed as a text description, underlines, highlights, images, links, and so on [4].In software programming, it means text comments embedded in the code that are ignored while running.We have to mention how to create annotations on the text in four important stages: Pre-processing is the initial step in the annotating process.Preprocessing entails cleaning and formatting the data in a way that is appropriate for the annotation task in order to get it ready for annotating.Before the data can be annotated, this may entail eliminating undesirable characters or symbols, translating the data to a particular format, and locating any problems or errors.Pre-processing guarantees that the data is prepared for annotation and that the annotation process will go well, making it a crucial step.Annotation guidelines creation is the second stage of annotation [5].The rules or standards that describe how the data should be annotated are known as annotation guidelines.The categories or labels that will be used for annotation, the conventions to follow, and any guidelines or standards that must be followed during the annotation process are all outlined in these documents.Annotation rules are important because they guarantee that the annotation process is consistent, that the annotated data is of good quality, and that it can be used successfully for the objectives for which it was annotated.Annotation itself is the third step in the annotation process.Applying the annotation standards to the data in order to categorize or label it with the desired categories or labels is known as annotation.This could entail identifying and categorizing identified entities, including persons, locations, and organizations, or giving the text a sentiment or emotion.Human annotators can manually annotate documents by hand, or algorithms of machine learning and other technologies of Natural Language Processing (NLP) have the ability to automatically annotate documents.Although manual annotation is frequently regarded as the best practice, it can be expensive and timeconsuming.Although it may be quicker and more effective, automatic annotation might not be as precise or dependable as manual annotation.Quality control is the fourth and last step in the annotating process [6].For the purpose of ensuring that the annotated data is precise, reliable, and of the highest caliber, quality control entails reviewing and assessing it.This may entail error analysis, which involves identifying and fixing flaws in the annotated data, or inter-annotator agreement studies, which involve many annotators annotating the same data to evaluate the reliability of the annotations.Quality control is important since it makes sure the annotated data is trustworthy and useful for the intended uses.Document annotation is a process that may be manual, semi-automatic, or fully automatic, so manual annotation gives the user the ability to add annotations to the website and share them with others.Figure 1 illustrates the annotation process on the raw text as an example.Figure 2 shows the representation of annotation and semantic annotation.
In the Semantic Web, the data are collected in a huge manner, so that in annotation process the text must be segmented in order to be easily annotated and then to be understood, Figure 3 illustrate this operation.
The raw data have an ambiguity concept in which it is difficult to distinguish what the meaning of the text is by human or machine; for example, a specific word may have different meanings, so the annotation process gives the disambiguation for the text.There are three types of annotations: the first is formal, which is considered machinereadable; the second is informal, which is human-readable; and the third is ontological, which is both human-and machine-readable.The formal annotation is very close to being a semantic annotation [7].Semantic annotation is the process of adding information about the meaning of the text or data.Semantic annotation is defined as the process of adding additional linguistic information to the available linguistic forms to make them more descriptive [8].Semantic annotation could also be defined as the process of adding semantic information to linguistic functions.Suppose we have two sets of objects, documents, and formal representations.In this way, two functions can also be defined.The first function that maps the documents into formal representations is called annotation, and the second function that maps formal representations to documents is called indexing.In semantic annotation, there are three important aspects that must be identified: entities, concepts, and relationships between them.Figure 4 shows the semantic annotation architecture.
Semantic annotation is a tuple that has (as (Subject = the annotated data), ap (predicate = the annotation relation), ao (object = is the annotation relation), and ac (context = the relationship between the as and ao in which the annotation was made).Figure 5 illustrates the semantic annotation.
In general, semantic annotation has a body and target; the body is the text that is required to be annotated from the whole text.The target includes the source text and selector, and the target of a specific resource has a URI (Unified Resource Identifier).The selector has three parts: the first is the text positioner selector, the second is the start position, and the third is the end position of the body, as shown in Figure 6.
The technique of tagging documents with pertinent concepts and adding information to them that connects the content to ideas shown in a knowledge network is known as semantic annotation.This facilitates the discovery, comprehension, and reuse of unstructured content.Here are some significant details about its significance [9,10].
Better information retrieval, Documents with semantic tags are simpler to locate, comprehend, integrate, and reuse thanks to semantic annotation, which enhances information retrieval.Smarter knowledge management, by transforming material into a more manageable data source, contributes to improved knowledge management.Enables Semantic Search and Content Aggregation: among the most popular uses made possible by semantic annotation are automated relationship discovery, information aggregation, and semantic search.Facilitates Automated Operations: computers can carry out tasks like categorization, linking, inferencing, searching, filtering, and so on with the help of semantic annotations.Enhances Interoperability and reusability-the integration of data required for model parameterization and validation, as well as the interoperability and reusability of models-are all improved by semantic annotation.Important in Computer Vision and Information Retrieval: In computer vision and information retrieval, semantic annotation is crucial.It offers comprehensive semantic details regarding unseen pictures, including the semantic kind and related visual relationships.The objective of this review is to find out the latest techniques and tools used in semantic annotation and to discover what the researchers do.The primary goal of this review is to answer the questions and get a clearer understanding of semantic annotation.Improving knowledge management by turning the knowledge into smarter and more manageable data is one of the impacts of semantic annotation on text data [11].A better representation of ambiguity while dealing with noisy and inaccurate parsers, as well as semantic annotation, increases the amount of extracted data and provides a better representation of ambiguity.

Figure 6. Semantic annotation components
The further impact of semantic annotation is improving discovery and interoperability, which promotes reusability and reproducibility.One of the most significant semantic annotation approaches, which is considered the state of the art, is Named Entity Recognition (NER).This technique aims to classify and identify named entities in the input corpus, such as persons, organizations, and locations.Deep learning methods like Bidirectional Long Short-Term Memory (BILSTM) and the Bidirectional Encoder Representation from Transformers (BERT) model have achieved important improvements in NER tasks in terms of capturing contextual information.This research focuses on a specific semantic annotation technique in which we will use deep learning for word embedding and NER and map them with LOD for evaluation purposes.

PROBLEM STATEMENT
Most research mentions that the input data for Semantic Web and ontology-based systems is huge; some of these data are structured and some are unstructured.On this occasion, there is difficulty navigating through the input data when these inputs are text, document, or any other format [12].There are some studies that mention that the large amount of data suffers from data ambiguity, which means each word has multiple meanings.This makes it difficult for machines to understand the intended meaning of the text and can lead to errors or inaccuracies in natural language understanding and information extraction tasks.For this reason, some studies take into account the contextual meaning of the terms to help computers understand and analyze the text more effectively [13].
Another issue with semantic annotation is the high cost of manual annotation, which can be time-consuming and resource-intensive.Automated annotation can help to reduce these costs, but there are still challenges associated with achieving high levels of accuracy and consistency with automated annotation, particularly when dealing with complex and diverse data sources [14].
Some researchers mentioned that the data suffers from heterogeneity, such as data from different sources having different formats, ontologies, and vocabularies [15,16] mentioned that the semantic annotation process requires highquality data to ensure accurate and consistent annotation, and the quality of the data can be impacted by factors such as errors, omissions, and inconsistencies [17,18] illustrates that semantic annotation requires regular maintenance and updating to ensure that the annotation remains accurate and relevant over time, especially as the data evolves.
From all above, the goal of this research is to smooth the navigation through the text by using the semantic annotation tools to reduce errors and ambiguity in the input text.For this reason, the dataset should be chosen carefully to ensure the quality, consistency, and homogeneity of the data, and finally to reduce the time consumed while using the manual annotation.These gaps can be resolved by using deep neural networks and machine learning methods.

LOD
LOD is one of the Semantic Web's (it also has the familiar name Web of Data) basic foundations.The Semantic Web is all about connecting datasets in ways that humans and machines can comprehend.The recommended practices for establishing these linkages are provided by Linked Data.Linked data, in other terms, is a combination of designing essentials for exchanging interconnected, machine-readable data on the Web [19].
Two sets of design guidelines for sharing machine-readable interrelated data on the Web are linked data and LOD.Machine-readable, semantic data that a machine can "understand" is referred to as linked data.Semantics come from links, and the majority of linked data development is concentrated on ontologies that give words meaning and tools that interpret data as microdata, RDFa, or RDF.LOD is created when Linked Data and Open Data (freely used and disseminated data) are joined.A type of LOD database is an RDF database, such as GraphDB from Ontotext.Large datasets from many sources may be handled by it, and it can connect them to Open Data, which facilitates effective datadriven analytics and knowledge discovery.
The term LOD refers to a collection of best practices for posting and connecting structured data on the internet.The data is open in the sense that it can be freely accessed, used, and reused by anyone for any purpose.LOD enables data to be integrated and joined from many sources, allowing for more meaningful and valuable information to be derived from it.The main technical components of LOD include the use of URIs to identify resources, the use of RDF to represent data in a graph-based format, and the use of standard RDF vocabularies to describe the relationships between resources.The use of standards such as SPARQL and RDF makes the data accessible and queried uniformly [20].7 illustrates that the LOD includes classes of objects such as information about (person, organization, location, and document).While the relationship type includes, as an example, information about (owner, manufacturer, and author of the book, LODs have attributes such as date of birth, population of the geographic region, and so on.In conclusion, the information that has been available in LOD is very huge and varied.
Resource Description Framework, or RDF, is a paradigm that may be used to convey information about both concrete items and abstract ideas.It uses a graph structure to represent the relationships between things.Anything may be described with RDF, including people, pets, things, and ideas of all kinds.Statements with the following syntax are used to convey information: <subject>, <predicate>, <object>.The relationship between the subject and the object is expressed in these sentences.Resources include both the topic and the object [21].RDF Query Language and Protocol, or SPARQL, is an RDF query language that may be used to obtain and modify data that is stored in RDF format23.Users can use it to query data from any data source that can be mapped to RDF, including databases.Your data is internally written as triples with subjects, predicates, and objects and is seen by SPARQL as a directed, labeled graph [22].
Unified Resource Identifiers, or URIs, in LOD Every resource on the Semantic Web has a URI.URIs are distinct identifiers used to refer to content that has been posted online.They are essential to the Findability, Accessibility, Interoperability, and Reusability (FAIR) of statistical data.By serving as particular "addresses," URIs facilitate the location of data.The URI allows a user to easily access the data after it has been located.URIs offer a common format that facilitates machine-to-machine communication by making it simple to combine data with other information [23].LOD faces some challenges and limitations, such as a lack of distinction among building materials, a lack of clarity and specificity, and a lack of alignment with the level of information needed.Maintaining visual quality and optimizing efficiency are two factors that must be balanced while adopting LOD.To obtain the required outcomes in real-time rendering applications, overcoming the obstacles and limits calls for a mix of technical know-how, creative considerations, and meticulous optimization [24].

ONTOLOGY
Within the fields of philosophy and computer science, ontology pertains to the formalization of knowledge in a certain area or subject.It is a methodical framework for classifying and arranging the ideas, things, connections, and attributes that make up a given domain [25].The formal definition of ontology is represented as a structure O: = (C, ≤ C, R, ≤ R) consisting of (1) two disjoint sets C and R called the Concept identifier and the Relation identifier respectively; (2) a partial order ≤ C on C called concept hierarchy or taxonomy; (3) a function σ: R →C X C called signature; and (4) a partial order ≤ R on R called relation hierarchy.Figure 8 depicts an example of the ontology of foods.

Figure 8. An Example Ontology of Food
Ontology, commonly referred to as knowledge management, is a model that depicts knowledge as a collection of concepts within a certain domain and records the connections between them.It compiles information inside an organization as a model, which is then queried by the user to address challenging questions and highlight connections throughout the organization [26].People nowadays have to reach more data in one day than most people have in their entire lives in the past decades.The big issue is that these data exist in many different formats, and all of these data have been captured in many different forms, which makes it nearly sophisticated to be understood in terms of existing relationships between different data.In the current world, determining how policies are captured in written documents, how those policies link to the business processes documented in models, and how those business processes relate to the data stored in databases is extremely complex.Data must be formed in a way that permits these sorts of relationships to be discovered; ontology apprehends this data in such a manner that these relationships can be seen.
The RDF and Web Ontology Language (OWL) are the two standards that modulate the designation of ontologies.Ontologies, according to RDF and OWL, consist of two fundamental components: classes and relationships.Figure 9 illustrates how the class of person has a relationship with the class organization; the relationship type is Has Employer.The combination of these classes and the relationship is called the triple; this triple consists of subject, predicate, and object.All the components needed in any domain and their relationships are specified using an ontology.Since tags are meaningless on their own unless they are placed in some kind of context, it is futile to annotate the page with tags that are not connected to an ontology.By offering a formal framework for classifying, expressing, and managing information inside an organization or a particular topic, ontology plays a vital role in knowledge management.Ontology helps in organizing knowledge by providing a formal structure that defines relationships, entities, concepts, and properties in a specific domain.By representing knowledge in a consistent and standardized manner, ontologies enable efficient storage, retrieval, and navigation of information.Furthermore, ontology ontologies provide a shared lexicon and semantic structure to facilitate knowledge exchange and interoperability.They enable cooperation and communication between many people, groups, and systems by supplying a common understanding of ideas and connections.Information may be exchanged and combined more easily because of ontologies, which facilitate the integration of knowledge from diverse sources [27].Ontologies have the ability to map and convert data across various forms and formats.Ontologies facilitate the conversion and alignment of data across multiple formats by specifying the connections and mappings between concepts in the ontology and the relevant components in other data sources.This facilitates data harmonization and standardization, which facilitates data analysis, querying, and interpretation.

SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is a standard query language for retrieving information or data from RDF databases.This query language is used for querying and manipulating the RDF data from the web and allows users to retrieve and modify the data that is saved in triple stores (RDF databases) in a similar way to how SQL is used with relational databases.SPARQL provides an expressive way to describe complex queries across multiple RDF graphs and has become a standard for querying linked data on the web [28].Figure 10 illustrates how to pick each element of the query and write it in the formal format of SPARQL so that all elements at the end will be written in the same way.Here is a simple example of a SPARQL query that retrieves the names and ages of individuals who are described as instances of the FOAF: RDF data, which is used by SPARQL, is information represented as triples (subjectpredicate-object). RDF can describe complex connections and semantics and enables extendable and flexible data structuring.

Scope and Use Cases
SQL is widely used in traditional relational database management systems (RDBMS) and is suitable for applications that deal with structured and tabular data.It is commonly used in business applications, data analysis, reporting, and transaction processing.
RDF data, which is frequently utilized in knowledge graphs, ontology-based systems, linked data, Semantic Web applications, and linked data repositories, may be queried and altered using SPARQL.With the use of strong graph-based queries, SPARQL makes it possible to navigate intricate relationships and find new data.

RDF
RDF is defined as the criterion for modeling and interchanging information on the web.It is a way of describing resources and their relationships in a machine-readable format.RDF data is represented as a graph of triples, where every triple consists of a subject, a predicate, and an object.RDF is used in a variety of applications, such as the Semantic Web, knowledge management, and data integration.A question might be arising: why RDF?The answer to this question is incorporating machine-readable data into Web pages using the well-known schema.orglanguage, allowing for better search engine display or automatic processing through third-party apps.utilizing external datasets to enhance one's own.For example, a dataset on paintings may be enhanced to provide users access to a wealth of information on the relevant artists and associated websites by linking it to the Wikidata entries for those artists.The connection of Application Programming Interface (API) feeds makes it simple for consumers to learn how to obtain more data.building data aggregations around certain subjects using the datasets that have already been made available as linked data.Constructing sporadic social networks by connecting RDF descriptions of individuals from many sources.provide a standard-compliant database data sharing method.Interconnecting disparate datasets within a specific organization to enable SPARQL cross-dataset searches [29].
The RDF data model is based on three disjoint pairs called RDF terms.The set that has been referred to as International Resource Identifiers (IRIs) is used for resource recognition.While the group set L refers to literals, it also refers to strings and datatypes that may be language-tagged.The group set B of what are called blank nodes has been interpreted as local existential variables.RDF is a triple (s, p, o)  IB X I X IBL.The RDF graph consists of a combination of RDF terms, where each triple  G represents a directed labeled edge.The letters (s, p, o) refer to subject, predicate, and object.The graph has been denoted by s(G), P(G), and O(G) so that (G)≔S(G)∪ O(G), the group set of nodes in G. Figure 11 shows the RDF graph as an example.

Figure11. RDF example
Vocabularies are used in the Semantic Web.The script illustrated in this section is a very essential document describing an individual person using RDF and OWL.This example explains the format of using FAOF vocabulary, which is very common in the Semantic Web [30].

Semantic annotation process
As we mentioned before, in semantic annotation, there are two main functions.The first function maps the documents into formal representation, called annotation; the second function is from formal representation to document, called indexing.The formal representation is modeled after ontologies because it consists of the description of the type of object and the concepts with their relationships and properties.On this occasion, the goal of semantic annotation in textual input is to identify the concepts with the support of the domain ontology because the ontology is domain-specific.Semantic annotation is the operation of adding metadata to text, images, videos, or other digital content to describe its meaning and context.This information helps search engines and other applications comprehend the content and provide more accurate search results or recommendations [31].Semantic annotations can be created using various standards, such as RDF, Microdata, or JSON-LD, and can include information such as the author, date of publication, title, and relevant topics or entities.The goal of semantic annotation is to enable better organization, discovery, and analysis of digital content [32][33][34][35].

Semantic Web
It is a combination of criterion technologies to understand a web of data.The Semantic Web builds upon the traditional World Wide Web (WWW) by adding a layer of meaning to the published information on the web.This meaning is represented as metadata, or data about data, which describes the relationships and context of the information.The idea is that by making information more machine-readable, it can be easily processed, analyzed, and combined with other data sources to provide new insights and knowledge.
The Semantic Web uses technologies such as RDF, which is a flexible model of data that describes resources and their relationships, and OWL, which is a language for expressing more complex relationships and rules between concepts.Additionally, SPARQL, a query language that has been used to retrieve information from RDF data sources, is used to access and manipulate the data stored on the Semantic Web.

Figure 12. Semantic Web Layers
The Semantic Web has the potential to transform a wide range of applications, including search engines, e-commerce, and knowledge management systems, among others, by enabling machines to comprehend the meaning of material on the web.It is seen as a way to bring structure and meaning to the enormous amount of unstructured data available on the web, making it easier to find, analyze, and use [36].
Figure 12 illustrates what is so called the Semantic Web layers.In this figure, we can find out how the evolution of the Semantic Web starts with the URI and XML reaching the trust stage, which means the last stage depends on the previous stages, and the result will be trusted.

Deep Neural Network (DNN)
The role of DNN in semantic annotation is crucial; it is considered a powerful tool to predict and assign meaningful tags to the text.A lot of labeled data is frequently needed for NLP jobs in order to train machine learning models.Annotation gives us the labeled data we need to train models for many NLP tasks, such as sentiment analysis, named entity identification, machine translation, and part-of-speech tagging.The usefulness and performance of these models are directly impacted by the precision and quality of the annotations [37].Supervised learning, dataset creation, and knowledge acquisition are also important roles.DNN has the ability to learn very complex patterns from tremendous datasets that allow it to capture the semantic meanings of words in hierarchical form.DNN consists of multiple layers designed to process data in hierarchical form; each layer learns from a higher level of input data and then builds a deep understanding of the low-level layers gradually.DNN is a class of ML models that are inspired by the human brain in terms of functions and structure.In NLP, deep neural networks, such as transformer models and recurrent neural networks (RNNs), have the ability to be trained on a huge number of textual datasets for the purpose of learning semantic representations of words or sentences [38].These representations can then be used to automatically annotate text data with semantic information, such as NER to identify entities like people, organizations, or locations, or sentiment analysis to determine the sentiment of a piece of text [39].DNN is capable of significantly improving the accuracy and efficiency of semantic annotation tasks by automatically learning complex patterns and representations from the data [40].However, it is important to remember that deep neural networks are not infallible and may still have limitations, such as bias in their training data, which should be carefully considered and addressed in the annotation process to ensure accurate and unbiased results [41].In the context of NLP and comprehension, deep neural networks (DNNs) and semantic annotation are connected.Labeling or marking text with extra semantic information, entities, connections, meanings, etc. in order to improve the text's comprehension is known as semantic annotation.Semantic annotation is one of the many NLP tasks in which DNNs, more precisely deep learning models, have found widespread use.DNNs are well-suited for jobs requiring the comprehension and extraction of meaning from text since they are adept at learning intricate patterns and representations from vast volumes of data [42].

METHODS OF ANNOTATION
Semantic annotation is the operation of assigning entities to the input text [43].Semantic annotation is suitable for any type of text data, such as web pages, documents, social media posts, educational content, etc.The result of annotation is a document or web page with machine-interpretable markup to create annotations with semantics that are well defined.

Manual annotation
The key advantage of manual annotation is the ability of an expert human to understand the context and nuances of the data.A human annotator has enough knowledge and sense to interpret the input data.For instance, in NLP, human annotators can distinguish scoff, irony, or text ambiguity, which are often difficult for automated methods to understand.This contextual comprehension and domain expertise of the annotators contribute to the accuracy and quality of the annotated data, leading to better performance of machine learning models.Another advantage of manual annotation is its flexibility and adaptability to changing requirements, which ensure that the annotation process remains up-to-date and relevant.Furthermore, manual annotation plays an important role in utilizing complex or rare cases.In different domains, the data that requires annotation may not be suitable for standard patterns or may be outliers.For example, human annotators, who have the ability to think, can grip various cases effectively by designing or planning on their expertise and judgment to provide accurate annotations.Manual annotation could also detect potential biases in the input data and alleviate them, keeping fairness and ethical considerations in mind in the annotation process.
In spite of the advantages of manual annotation, it also has some challenges.First is the possibility of human bias.The human annotator might have inherent biases, aware or unaware, that can impact the annotations that they provide.Bias in the annotation process can lead to biased machine learning models, perpetuating unfairness and differentiation.However, this challenge can be alleviated through appropriate guidelines, training, and regular quality control checks to ensure the accuracy and consistency of annotations [44].Realworld examples that have been employed are image and video annotation, text annotation, speech and audio annotation, medical annotation, social media and user-generated content, autonomous vehicles, and e-commerce and recommendation systems.

Semi-Automatic annotation
It is considered a bridge between the manual and automatic annotation gaps.It is a hybrid method that combines the strengths of manual and automatic methods, in other words, bridging the gap between human annotation expertise and the efficacy of the machine.The process begins with human annotation that provides the initial state of data labeling, and this process is considered a training set for the machine learning algorithm.The algorithms then use these training sets to annotate the remaining data automatically.The annotation process is iteratively repeated until we achieve the desired level of annotation accuracy.Semi-automatic annotation is efficient; the reason is that this process can rapidly annotate a large amount of data.This is beneficial in applications that require frequent updates of data annotation, such as customer sentiment analysis, social media monitoring, etc.The semiautomatic method can improve annotation accuracy.When combining human expertise with machine learning efficiency, semi-automatic annotation can achieve high accuracy, making it appropriate for applications that require high-quality annotations, such as legal document analysis and medical diagnosis [45].
Despite the advantages of semi-automatic annotation, it also has some challenges.One challenge is that this method sometimes includes errors, which can be propagated by the machine learning algorithms, leading to inaccurate automated annotations.Regularly checking the quality and feedback loops between human and machine learning algorithms is significant to characterize and validate errors in the annotation process.Automatic annotation is the revolution of data annotation in artificial intelligence (AI) and machine learning (ML).There are real-world examples of this method: active learning, pre-annotation or seed annotations, weak supervision, transfer learning, and crowdsourcing.

Challenges in Managing Ambiguity
It is a very fast approach that uses machine learning algorithms to annotate the data without human intervention.Most recent research has also paid attention to this method due to its ability to annotate data by improving scalability, efficiency, and cost effectiveness [46].Methods of automatic annotation can handle a huge dataset with ease, making them well-convenient for applications that involve big data, such as speech recognition, image recognition, and NLP.Automatic annotation also alleviates human bias, which can impact the quality and fairness of data annotation.Furthermore, automatic annotation depends on data-driven algorithms that are not affected by subjective biases.This can lead to more consistent and objective annotation, especially in sensitive areas of research such as finance, healthcare, and legal applications where the accuracy of annotated data is crucial.The limitation of using automatic annotation is that ML algorithms do not always accurately capture complex data, especially when the data are context-dependent, rare, and ambiguous.These errors can lead to inaccurate annotation, so regular validation, feedback loops, and quality checks are significant to ensure the accuracy and reliability of the annotation process.Another limitation is the lack of interpretability.ML algorithms, especially DNNs, can be mysterious, fuzzy, complex, and opaque, leading to difficulty in understanding and interpreting the input data [47].The real examples deployed in this method are object detection, text classification, speech recognition, NER, image captioning, sentiment analysis, semantic segmentation, and document layout analysis.Table 2 illustrates the pros and cons and differences between the three methods of annotation manual, semiautomatic, and automatic.
Pre-trained models and transfer learning are a recent development in annotation that involves optimizing models trained on extensive datasets for particular tasks.In order to decrease the quantity of human annotation needed and increase annotation efficiency, future initiatives can incorporate pre-trained models and transfer learning strategies.Integrating AI with human expertise and bringing together the best aspects of both fields can provide annotation procedures that are more precise and effective in the future.

BACKGROUND
Albukhitan et al. [48] explored the functionality of word embedding usage from the algorithms of utilizing deep learning for semantic annotation of Arabic documents.Food, neutrinos, and health ontologies have been used to evaluate the performance of the proposed model.In this research paper, the author mentioned that it is not feasible to make semantic annotations on web documents manually due to the huge amount of web content.Since semantic annotation is considered the process of adding content that is machinereadable to the NLP in the format of RDF and ontology, the RDF is for information extraction, and the ontology is for the concepts of representation, relationships, and rules of semantics that are applied to the knowledge.The limitations of this research are that the semantic annotation has been done for Arabic, while most research papers have been done for Latin languages.By using deep learning, we have the ability to get word embedding by applying NLP models such as CBOW and skip gram.The semantic annotation framework of the research includes the stages of data acquisition, data preprocessing, word embedding using Word2Vec, instance and candidate matrix generators, and training neural networks for vector weight calculations.
The research paper by Campos-Rebelo et al.
[49] refers to the fact that using semantic annotation creates interoperability between systems that are heterogeneous, so the researcher proposes a group of semantic annotation rules in the XML schemas.SAWSDL (Semantic Annotation Web Description Language): in this method, the semantic annotation is added to the XML schema definition XML Schemas (XSD) files and the metadata that describes the XML message exchange operation.Semantic annotation XML schema with SAWSDL: each element of XML schema (XSD) has the ability to be annotated with an ontology concept; for instance, the XML element" indoorTemp" has been annotated.
The XML element indoor is annotated with the concept indoor temperature for the ontology, as shown in Figure 13.The annotation path is capable of generating annotations that are more expressive, and it also has the ability to annotate the XML Schemas (XSD) elements with ontology concepts and properties, not only with concepts like Semantic Annotations for Web Service Description Language (SAWSD).The XML schema definition XML Schemas (XSD) schema elements have been annotated with an annotation path, whereas every path is a series of steps, and every step is a property or concept of the ontology reference in the path of annotation.We have two classification paths in which they are: the odd and even steps, the even our property, while the odd are concepts.The object property is incapable of being the eventual step, and the datatype property is not the middle step.Conceptual steps may have some restrictions.The XSD element annotated in (1) uses Semantic Annotations for Web Service Description Language (SAWSDL), which has been annotated in (2) with an annotation path.In step (2), the first step (an even step) is considered a concept step, and the second step (an odd step and the final one) is considered a data type property.Another example of an annotation path is presented in (3).In this current example, the first and third steps are concepts, and the second is an object property.<x s: element type="xs:float" name="indoorTemp" sawsdl:modelReference="=IndoorTemperature"=> <x s: element type="xs: string" name="unitsa" sawsdl: modelReference =="Temperatu reSensor=hasUnits=TemperatureUnits"=> (3) Here is an example of a message of XML that has been issued in Figure 14, and the associated XML schema XSD with semantic annotations, using paths of annotation, has been illustrated in Figure 8.In Figure 15 an example of an XML message has been illustrated.SeMFIS is considered a pliable engineering platform for conceptual semantic annotation models [50].This platform provides a link between the two fields of ontologies and conceptual modeling; the benefit will be from both sides.One is that ontologies provide formal information to enrich conceptual models.Second: the visual editors and semiformal use of conceptual models to ease the interaction for nontechnical users.SeMFIS can be appended to the existing models without affecting the available structures.
Queries in SeMFIS are expressed in AQL (ADOxx Query Language).This language is not as powerful as SQL, but it is similar in that it gathers information by providing query definitions targeting reference elements in semantic annotation and ontology.The output of AQL is either rtf, csv, or HTML.
It can be interpreted by the user for post processing.Figure 16 illustrates the example of using AQL in SeMFIS.
The Proté gé ontology management tool has been used in SeMFIS because this plugin is widely used in different scopes of science.The proté gé has been adopted in order to provide import and export interfaces for the reason of exchanging information of the model in different file formats.
The proté gé plugin allows for loading properties, classes, and instances from ontologies (OWL) in the related environment, and then it can decide which one of the referred elements could be exported to the SeMFIS.Finally, the picked elements are stored as XML or RDF files that compatible with SeMFIS and then visualized in according to OWL model.Proté gé is a software that has a user interface with the ability to create ontology classes and concepts, and then it could map the created ontologies with datasets from research projects.Semantic tags can enrich the citation graph by interconnecting papers with citation reasons.The writer noted that the research is one of the best sources of information for determining the purpose of citation [52].In this research, the writers developed a system called CCRO (Citation, Context, and Reasons Ontology).This system has been used to semantically tag the citation in Latex documents to find out the relation between articles.They also examined a variety of automatic and human authoring systems for the integration of citation reasons, as shown in Figure 18.In order to facilitate the annotation and organization of citation contexts and justifications, the CCRO ontology offers a systematic framework that helps scholars examine and comprehend the connections among various academic publications.It enables the display of several citation contexts, including direct quotes, paraphrases, or summaries, as well as the justifications for referencing particular sources, like supporting data, opposing views, or similar works.Applications for the ontology include recommendation systems, information retrieval, citation analysis, and automated literature reviews.
The typesetting program LaTeX is widely used in academic and scientific publications.It enables writers to produce texts with excellent typography, especially when writing about science and technology.Authors may organize their writings, add mathematical equations, handle references, and produce output that looks professional with LaTeX's markup language and macros.The motivation in this study illustrates why the researchers chose Latex.They found from the history that there are so many authors and different disciplines used in Latex, which makes it an interesting amount of data to do research, as shown in Table 3. Loreggia et al. [53] used the SenTag web-based tool that is used for semantic annotation of documents in textual format.In this paper, there are a group of people who are experts employed to annotate the documents manually in order to identify the important details to train the AI models, so the researcher presents SenTag, a web-based application that has the ability to provide an intuitive interface for document annotation.Finally, the output of the model will be an extensible markup language (XML).The dataset that has been manually annotated plays an important role in defining the standards.ML systems provide better performance, but these systems strongly depend on human annotation.The NER annotator is a web-based annotation tool that provides a GUI, a graphical user interface, that assists any user in creating document annotations and also generates training data.The team of SenTag annotators concludes from admin, editor, and annotator.The admin role is significant in that it has the highest level by creating other users and giving them privileges.The editor's role is to upload the XML schema and assign the text to the schema, and he can also assign the text to the annotator.Finally, the annotator has the right to tag the documents and validate the work of the XML schema.
The study by Chen et al. [54] mentioned that matching table elements with knowledge graphs is called semantic annotation with tabular data.It is significant to semantically annotate tabular data for downstream applications such as data management and analysis.Because of insufficient tabular data descriptions, heterogeneity, and vocabulary issues, semantic annotation is considered a challenging task.This work presents MTab4D, an automated semantic annotation method for generating annotations using DBpedia.In order to address issues with schema heterogeneity, data ambiguity, and noise, MTab4D is a table annotation system that integrates diverse matching signals from various table components.This paper also provides relevant research and further resources for knowledge graph-based semantic annotation measurement.The researcher also creates MTab4D APIs and graphical user interfaces for repeatability.Their approach excels at all three matching tasks, according to the findings of their trials using the initial and revised datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019).
The suggested system's approach consisted of three tasks: (1) matching the cells of the table with entities; (2) matching the columns to entities; and (3) matching the pair of columns to attributes.The author suggests a pipeline annotation that merges multiple matchings from distinct table parts to handle schema heterogeneity and then data ambiguity.Figure 19 depicts the annotation workflow.

CONCLUSION
In this article, a review of previous works has been presented about semantic annotation and the tools and methods that are used to annotate the input text data in different formats.Various systems and technologies have been explained from the beginning of using semantic annotation methods to the present.The role of ontology and how it is mapped to the dataset.Additionally, this article illustrates the use of LOD, RDF, and SPARQL, in which these methods and tools are fully compatible with the structure of semantic annotation.The benefits of interoperability, flexibility, and querying capabilities may be used by semantic annotation through the combination of RDF as a data model, LOD principles, and SPARQL as a query language.They can be searched for using SPARQL, linked to other RDF datasets, and shown as RDF data so that semantic annotations can be taken out and changed to fit different needs or use cases.Semantic annotation systems are more interoperable and scalable as a result of their integration and compatibility with LOD, RDF, and SPARQL.The research mentions the use of deep learning methods with semantic annotation and how they are utilized to solve problems mentioned in the literature.First, using semantic annotation for Arabic corpus and how to enrich this data with ontology to convert the data into machine-readable format to be processed with NLP tasks.Then, using semantic annotation in XML schemas, it is aided by the XML schema definition, and then the metadata describes the XML message exchange.The SeMFIS method is also used to provide interconnection between two fields of ontologies and conceptual modeling.The MTab4D web-based tool is also used to solve the problem of tabular data annotation tasks.This research explains that proté gé is a significant tool to create and manage ontologies that can be used to map them with the entities extracted from dataset used.The evolution of semantic annotation methods is starting from simple tasks to contemporary tasks in which deep learning methods are used.Because ontology offers a formal, organized representation of domain knowledge, it is essential to semantic annotation.The contribution of ontology to semantic annotation in terms of conceptual framework.It offers a conceptual framework and standard terminology that annotators can employ to precisely and consistently represent the material.Semantic annotation uses ontologies to guarantee that meanings are clear and consistent across many applications and systems.In terms of meaning representation, through the specification of classes, subclasses, properties, and links between concepts, ontologies provide an organized representation of knowledge.Annotators can provide a more accurate and insightful description of the content by utilizing these pre-established ideas and relationships.From a future viewpoint, natural language comprehension tasks have demonstrated impressive advancements in the use of deep learning NLP techniques, such as neural networks and deep language models.Subsequent investigations may examine the utilization of these methodologies to enhance the precision and effectiveness of semantic annotation.To handle large-scale annotation activities, automate the annotation process, and extract more complex interpretations, for example, deep learning models specialized particularly for semantic annotation might be developed.The contribution of this article gives an overview of the contemporary topic by summarizing and surveying the body of work that has been done on semantic annotation.They pinpoint important research trends, approaches, obstacles, and trending themes.This contributes to a greater knowledge of the field's evolution for scholars and practitioners.

Figure 2 .
Figure 2. Annotation and semantic annotation representation

Figure
Figure7illustrates that the LOD includes classes of objects such as information about (person, organization, location, and document).While the relationship type includes, as an example, information about (owner, manufacturer, and author of the book, LODs have attributes such as date of birth, population of the geographic region, and so on.In conclusion, the information that has been available in LOD is very huge and varied.Resource Description Framework, or RDF, is a paradigm that may be used to convey information about both concrete items and abstract ideas.It uses a graph structure to represent the relationships between things.Anything may be described with RDF, including people, pets, things, and ideas of all kinds.Statements with the following syntax are used to convey information: <subject>, <predicate>, <object>.The relationship between the subject and the object is expressed in these sentences.Resources include both the topic and the object[21].RDF Query Language and Protocol, or SPARQL,

Figure 13 .
Figure 13.Example of ontology representation for temperature sensors elementtype xs float name indoorTemp sawsdl modelReferenceIndoorTemperature hasValue

Figure 14 .Figure 15 .Figure 16 .
Figure 14.XSD of the XML from Figure 5 with paths of annotation refer to the ontology

Figure 17
shows the proté gé example with SeMFIS export plugin with the Selection of classes, properties, individuals for export.

Figure 18 .
Figure 18.CCRO Based Semantic Citation Graph According to the study by Han et al. [5].Relation Extraction (RE) approaches are still used in simplified situations, and these methods focus on training models with a high number of human annotations to categorize entities with one phrase into relations.However, in the actual world, (1) gathering highquality human annotations is costly and time-consuming; (2) many long-tail interactions cannot give vast quantities of training instances; and (3) most facts are stated in an extended context consisting of numerous phrases.(4) It is difficult to use a predetermined set to cover such relationships with openended expansion.As a result, create an effective and resilient RE system for real-world deployment.The researcher provides a comprehensive and detailed review of relation extraction model development, generalizes four promising directions leading to more powerful RE systems (using more data, performing more efficient learning, handling more complicated contexts, and orienting more open domains), and investigates two key challenges faced by existing RE models.Semantic tags can enrich the citation graph by interconnecting papers with citation reasons.The writer noted that the research is one of the best sources of information for determining the purpose of citation[52].In this research, the writers developed a system called CCRO (Citation, Context, and Reasons Ontology).This system has been used to semantically tag the citation in Latex documents to find out the relation between articles.They also examined a variety of automatic and human authoring systems for the integration of citation reasons, as shown in Figure18.In order to facilitate the annotation and organization of citation contexts and justifications, the CCRO ontology offers a systematic framework that helps scholars examine and comprehend the connections among various academic publications.It enables the display of several citation contexts, including direct quotes, paraphrases, or summaries, as well as the justifications for referencing particular sources, like supporting data, opposing views, or similar works.Applications for the ontology include recommendation systems, information retrieval, citation analysis, and automated literature reviews.The typesetting program LaTeX is widely used in academic and scientific publications.It enables writers to produce texts with excellent typography, especially when writing about science and technology.Authors may organize their writings,

Figure 19 .
Figure 19.MTab4D framework for tabular data annotationThe research by Di Martino et al.[55] uses a technology named SemPRNN, which is a web-based tool that provides semantic annotation of business process model notation (BPMN) with OWL ontology concepts.SemPRNN uses the domain ontology to provide unambiguity to identify the concepts; this tool is used to upload the ontology, BPMN notation, and prolog to provide the annotations required for the purpose of business process models.Figure20shows the semantic annotation in BPMN.
Figure 20  shows the semantic annotation in BPMN.

Table 1 shows
Person class in an RDF dataset The query in this example makes use of the predefined prefixes FOAF and RDF.The RDF namespace is denoted by the RDF prefix, whereas the Friend of a Friend (FOAF) vocabulary namespace is denoted by the FOAF prefix.a brief difference between SQL and SPARQL in terms of data model, querying capabilities, data representation, and scope and use cases.

Table 1 .
Difference between SQL and SPARQL SQL works with structured data that has predetermined schemas and is intended for use with relational databases.Data is represented by tables with rows and columns, and functions like choosing, adding, updating, and removing data are supported.Based on a graph model, SPARQL is intended for searching RDF (Resource Description Framework) data.Flexible data modeling is possible with RDF as it expresses data as triples (subject-predicateobject). To access data, SPARQL queries navigate the graph structure.

Table 2 .
The differences between the three methods of annotation

Table 3 .
Different disciplines used in Latex