Читать книгу Search-Based Applications - Gregory Grefenstette - Страница 9
ОглавлениеGlossary
Glossary
ACID | Constraints on a database for achieving Atomicity, Consistency, Isolation and Durability |
Agility | The ease with which a computer application can be altered, improved, or extended |
API | Application Programming Interface, specifies how to call a computer program, what arguments to use, and what you can expect as output |
Application layer | Part of the Open System Interconnection model, in which an application interacts with a human user, or another application |
Atomicity | The idea that a database transaction either succeeds or fails in its entirety |
Availability | The percentage of time that data can be read or used. |
Batch | A computer task that is programmed to run at a certain time (usually at night) with no human intervention |
B2C | Business to Customer; B2C websites offer goods or services directly to users |
B+ tree | A block-oriented data structure for efficient insertion and removal of data nodes |
BI | Business Intelligence, views on data that aid users with business planning and decision making |
BigTable | An internal data storage system used by Google, handles multidimensional key-value pairs |
BSON | Binary JSON |
Business application | Any information processing application used in running a business |
Cache | A rapid computer memory where frequently or recently used data is temporarily stored |
CAP theorem | One cannot achieve Consistency, Availability, and Partition tolerance at the same time |
Category | A flat or hierarchic semantic dimension added to a document, or part of a document |
Categorization | Assigning, usually through statistical means, one or more categories to text |
CDM | Customer Data Management |
Cloud services | Computer applications that are executed on computers outside the enterprise rather than in-house. Examples are SalesForce, Google Apps, Yahoo mail, etc. |
Clustering | Grouping documents according to content similarity |
CMS | Content Management System |
Consistency | A quality of an information system in which only valid data is recorded; that is, there are not two conflicting versions of the same data |
Connector | A program that extracts information from a certain file format, or from a database |
Consolidation | Making all the data concerning one entity available in one output |
COTS | Commercial off-the-shelf software |
Crawl | Fetching web pages for indexing by following URLs found in each page |
CRM | Customer Relationship Management, applications used by businesses to interact with customers |
CSIS | Customer Service Information System |
Data integration | Merging data from different data sources or different information systems |
Data mart | A subset of data found in an enterprise information system, relevant for a specific group or purpose |
Data warehouse | A database which is used to consolidate data from disparate sources |
DBA | Database administrator, the person who is responsible for maintaining (and often designing) an organization’ database(s) |
Deep Web | Web pages that are dynamically generated as a result of form input and/or database querying |
Directory | A listing of the files or websites in a particular storage system |
DIS | Decision Intelligence System, a computer-based system for helping decision making |
Document model | A model of seeing a database entity as a single persistent document, composed of typed fields and categories corresponding to the entity’s attributes |
Dublin Core Metadata | A standard for metadata associated with documents, such as Title, Creator, Publisher, etc. |
Durability | A database quality that means that successfully completed transactions must persist (or be recoverable) in the case of a system failure |
EDI | Electronic Data Interchange, an early database communication system |
ETL | Extract-Transform-Load, any method for extracting all or part of a database and storing it in another database |
Enterprise Search | Searching access-controlled, structured and unstructured data found within the enterprise |
ERP | Enterprise Resource Planning |
Evolutive Data Model | Model that can be easily extended with new fields or data types without rebuilding the entire data structure |
Facet | A dimension of meaning that can be used for restricting search, for example shirts and coats are two facets that could be found on a shopping site |
Field | A labeled part of a document in a search engine. Fields can be typed to contain text, numbers, dates, GPS coordinates, or categories |
Firewall | A computer-implemented protection that isolates internal company data from outside access |
File server | A service that provides sequential or direct access to computer files |
Full-text engine | A system for searching any of the words found in documents, rather than just a set of manually assigned keywords |
Garbage collection | A process for recovering memory, usually by recognizing deleted or out-of-date data |
Gartner | An information technology research and advisory firm that reports on technology issues |
GPS | Global Positioning System, a system of satellites for geolocating a point on the globe |
Hash table | Hashing converts a data item into a single number, and the hash table maps this number to a list of items |
Heuristics | Methods based more on demonstrated performance than theory, weighting words by their inverse frequency in a collection is an example |
HTTP | HyperText Transfer Protocol, an application layer protocol for accessing web pages |
IDC | International Data Corporation, a global provider of market intelligence and analysis concerning information technology |
ILM | Information Lifecycle Management |
IMAP | Internet Message Access Protocol, a format for transmitting emails |
Index, inverted | A data structure that contains lists of words with pointers to where the words are found in documents |
Index slice | One section of an inverted index which can be distributed over many different computer stores |
Intranet | A secure network that gives authorized users Web-style access to an organization’s information assets (e.g., internal documents and web pages) |
IR | Information Retrieval, the study of how to index and retrieve information, usually from unstructured text |
IS | Information System, a generic term for any computer system for storing and retrieving information |
Isolation | The database constraint specifying that data involved in a transaction are isolated from (inaccessible to) other transactions until the transaction is completed to avoid conflicts and overwrites |
IT | Information Technology, a generic term covering all aspects of using computers to store and manipulate information |
JDBC | Java Database Connectivity, a Java version of ODBC |
Join | In a relational database, gathering together data contained in different tables |
JSON | JavaScript Object Notation, a standard for exchanging data between systems |
Key-value store | A data storage and retrieval system in which a key (identifying an entity) is linked to the one or more values associated with that entity. This allows rapid lookup of values associated with an entity, but does not allow joins on other fields |
Mash-up | A software application that dynamically aggregates information from many different sources, or output from many processes, in a single screen |
MDM | Master Data Management, a system of policies, processes and technologies designed to maintain the accuracy and consistency of essential data across many data silos |
Metadata | Typed data associated with a document, for example, Author, Date, Category |
Mobile Web | Web pages accessible through a mobile device such as a smartphone |
MySQL | A popular open source relational database |
Normalized relational schema | A model for a relational database that is designed to prevent redundancies that can cause anomalies when inserting, updating, and deleting data |
NoSQL | Not Only SQL, an umbrella term for large scale data storage and retrieval systems that use structures and querying methodologies that are different from those of relational database systems |
OBI | Operational Business Intelligence, data reporting and analysis that supports decision making concerning routine, day-to-day operations |
OCR | Optical Character Recognition, a technology used for converting paper documents or text encapsulated in images into electronic text, usually with some noise caused by the conversion |
ODBC | Open Database Connectivity, a middleware for enabling and managing exchanges between databases |
Offloading | Extracting information from a database application and storing it in a search engine application |
OLAP | Online Analytical Processing, tools for analyzing data in databases |
OLTP | Online Transaction Processing |
Ontology | A taxonomy with rules that can deduce links not necessarily present in the taxonomy |
Partition tolerance | Means that a distributed database can still function if some of its nodes are no longer available |
Performance | The measure of a computer application’s rapidity, throughput, availability, or resource utilization |
PHP | PHP: Hypertext Preprocessor, a language for programming web pages |
PLM | Product Lifecycle Management, systems which allow for the management of a product from design to retirement |
Plug-and-play | Modules that can be used without any reprogramming, “out of the box” |
POC | Proof of concept, an application that proves that something can be done, though it may not be optimized for performance |
Portal | A web interface to a data source |
Primary key | In a relational database, a value corresponding to a unique entity, that allows tables to be joined for a given entity |
RDBMS | Relational database management system |
Redundancy | Storing the same data in two different places in a data base, or information system. This can cause problems of consistency if one of the values is changed and not the other |
Relational model | A model for databases in which data is represented as tables. Some values, called primary keys, link tables together |
Relevancy | For a given query, a heuristically determined score of the supposed pertinence of a document to the query |
REST | Representational State Transfer, protocol used in web services, in which no state is preserved, but in which every operation of reading or writing is self sufficient |
RFID | Radio Frequency Identification, systems using embedded chips to transmit information |
RSS | Really Simple Syndication, an XML format for transmitting frequently updated data |
R tree | An efficient data structure for storing GPS-indexed points and finding all the points in a given radius around a point |
RDF | Resource Description Framework, a format for representing data as sets of triples, used in semantic web representations |
SBA | Search Based Applications, an information access or analysis application built on a search engine, rather than on a database. |
SCM | Supply Chain Management |
Scalability | The desirable quality of being able to treat larger and larger data sets without a decrease in performance, or rise in cost |
Search engine | A computer program for indexing and searching in documents |
Semantic Web | Collection of web pages that are annotated with machine readable descriptions of their content |
Semi-structured data | Data found in places where the data type can be surmised, such as in explicitly labeled metadata, or in structured tables on web pages |
SEO | Search engine optimization, strategies that help a web page owner to improve a site’s ranking in common web search engines |
SERP | Search engine results page, the output of a query to a search engine |
Silo | An imagery-filled term for an isolated information system |
SMART system | An early search engine developed by Gerald Salton at Cornell |
SOAP | Simple Object Access Protocol, a format for transmitting data between services |
Social media | Data uploaded by identified users, such as in YouTube, FaceBook, Flickr |
SQL | Structured Query Language, commonly used language for manipulating relational databases |
Structured data | Data organized according to an explicit schema and broken down into discrete units of meaning, with units represented using consistent data types and formats (databases, log files, spreadsheets) |
SVM | Support vector machine, used in classification |
Table | Part of a relational database, a body of related information. Each row of the table corresponds to one entity, and each column, to some attribute of this entity |
Taxonomy | A hierarchically typed system of entities, such as mammals being part of animals being part of living beings |
TCO | Total cost of ownership, how much an application costs when all implicit and explicit costs are factored in over time |
Timestamp | A chronological value indicating when some data was created |
Top-k | The k highest ranked responses in a database system that can rank answers to a query |
Transaction | In databases, a sequence of actions that should be performed as an uninterruptable unit, for example, purchasing a seat on a flight |
Unstructured data | Data that is not formally or consistently organized, such as textual data (email, reports, documents) and multimedia content |
URL | Universal Resource Locator, the address of a web page |
Usability | The desirable quality of being able to be used by a large population of users with little or no training |
Vertical application | An application built for a specific domain, such as pharmaceuticals, finance, or manufacturing. A horizontal application could be used in a number of different domains. |
XML | eXtended Markup Language, a standard for including metadata in a document |
W3C | World Wide Web Consortium |
WYSIWYG | What You See Is What You Get |
YPG | Yellow Pages Group, Canada |